James Thomas

Notes on JavaScript

Phonebot

Last month, a colleague was explaining he was not looking forward to an afternoon of long-distance conference calls. Having recently started using Slack for collaboration with their remote team, they lamented…

I wish I could do my conference calls using Slack!

…which got us thinking.

Recent experiments with IBM Watson Speech To Text and Twilio on IBM Bluemix had shown how easy it was to create telephony applications. Slack publishes multiple APIs to help developers build custom “bots” that respond to channel content.

Could we create a new Slackbot that let users make phone calls using channel messages?

One month later, Phonebot was born!

Slackbot that lets users make phone calls within a Slack channel.
Users can dial a phone number, with the phone call audio converted to text and sent to the channel.
Channel message replies are converted to speech and sent over the phone call.

tl;dr Full source code for the project is available here. Follow the deployment instructions to run your own version.

Read on to find out how we put IBM Watson, Twilio and IBM Bluemix to develop our custom Slackbot…

Custom Slackbots

Slack publishes numerous APIs for integrating custom services. These APIs provide everything from sending simple messages as Slackbot to creating a real-time messaging service.

Phonebot will listen to messages starting with @phonebot and which contain user commands e.g. dial, hangup. It will create new channel messages with the translated speech results along with status messages. Users can issue the following commands to control Phonebot.

1
2
3
4
5
6
@phonebot call PHONE_NUMBER <-- Dials the phone number
@phonebot say TEXT <-- Sends text as speech to the call 
@phonebot hangup <-- Ends the active call
@phonebot verbose {on|off}<-- Toggle verbose mode
@phonebot duration NUMBER <-- Set recording duration
@phonebot help <-- Show all commands usage information 

We use the Incoming Webhooks API to post new channel messages and the Outgoing Webhook API to notify the application about custom channel commands.

Listening for custom commands

Creating a new Outgoing Webhook, messages from the registered channels which begin with the “@phonebot” prefix will be posted to HTTP URL for the IBM Bluemix application handling the incoming messages.

We can create Outgoing Webhooks for every channel we want to register Phonebot in.

For each registered channel, we need to allow Phonebot to post new messages.

Sending new channel messages

Incoming Webhooks provide an obfuscated HTTP URL that allows unauthenticated HTTP requests to create new channel messages. Creating a new Incoming obfuscated for each channel we are listening to will allow Phonebot to post responses.

Each Incoming Webhook URL will be passed to Phonebot application using configuration via environment variables.

Making Phone Calls

Twilio provides “telephony-as-a-service”, allowing applications to make telephone calls using a REST API.

Twilio has been made available on the IBM Bluemix platform. Binding this service to your application will provide the authentication credentials to use with the Twilio client library.

When users issue the “call” command with a phone number, the channel bot listening to user commands emits a custom event.

1
2
3
4
5
6
7
8
9
10
bot.on('call', function (number) {
  var phone = this.channels[channel].phone

  if (phone.call_active()) {
    bot.post('The line is busy, you have to hang up first...!')
    return
  }

  phone.call(number, this.base_url + '/' + channel)
})

Within the “phone” object, the “call” method triggers the following code.

1
2
3
4
5
6
7
8
9
10
11
12
this.client.makeCall({
  to: number,
  from: this.from,
  url: route
}, function (err, responseData) {
  if (err) {
    that.request_fail('Failed To Start Call: ' + number + '(' + route + ') ', err)
    return
  }

  that.request_success('New Call Started: ' + number + ' (' + route + '): ' + responseData.sid, responseData)
})

The URL parameter provides a HTTP URL which Twilio will use to POST updated call status information. HTTP responses from this location will tell Twilio how to handle the ongoing call, e.g. play an audio message, press the following digits, record phone line audio.

If the phone call connects successfully, we need the phone line audio stream to translate the speech into text. Unfortunately, Twilio does not support directly accessing the real-time audio stream. However, can record a batch of audio, i.e five seconds, and download the resulting file.

Therefore, we will tell Twilio to record a short section of audio and post the results back to our application. When this message is received, our response will contain the request to record another five seconds. This approach will provide a semi-realtime stream of phone call audio for processing.

Here is the code snippet to construct the TwiML response to record the audio snippet. Any channel messages that are queued for sending as speech will be added to the outgoing response.

1
2
3
4
5
6
7
8
9
10
twiml = new twilio.TwimlResponse()

// Do we have text to send down the active call?
if (this.outgoing.length) {
  var user_speech = this.outgoing.join(' ')
    this.outgoing = []
    twiml.say(user_speech)
}

twiml.record({playBeep: false, trim: 'do-not-trim', maxLength: this.defaults.duration, timeout: 60})

When we have the audio files containing the phone call audio, we can schedule these for translation with the IBM Watson Speech To Text service.

Translating Speech To Text

Using the IBM Watson Speech To Text service, we can simply transcribe phone calls by posting the audio file to the REST API. Using the client library handles making the actual API requests behind a simple JavaScript interface.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
var params = {
  audio: fs.createReadStream(file_name),
  content_type: 'audio/l16; rate=16000'
}

this.speech_to_text.recognize(params, function (err, res) {
  if (err) {
    this.error(err)
    return
  }

  var result = res.results[res.result_index]
  if (result) {
    this.transcript = result.alternatives[0].transcript
    this.emit('available')
  } else {
    this.error('Missing speech recognition result.')
  }
})

Having previously handling converting the audio file from the format created by Twilio to that needed by the Watson API, we were able to reuse the translate.js class between projects.

This module relies on the SOX library being installed in the native runtime. We used a custom buildpack to support this.

Managing Translation Tasks

When a new Twilio message with audio recording details is received, we schedule a translation request. As this background task returns, the results are posted into the corresponding Slack channel.

If a translation request takes longer than expected, additional requests may be scheduled before the first has finished. We still want to maintain the order when posting new channel messages, even if later requests finishing translating first.

Using the async library, a single-worker queue is created to schedule the translation tasks.

Each time the phone object for a channel emits a ‘recording’ event, we start the translation request and post the worker to the channel queue.

1
2
3
4
5
6
7
8
phone.on('recording', function (location) {
  if (phone.defaults.verbose) {
    this.channels[channel].bot.post(':speech_balloon: _waiting for translation_')
  }
  var req = translate(this.watson, location)
  req.start()
  this.channels[channel].queue.push(req)
})

When a task reaches the front of the queue, the worker function is called to process the result.

If translation task has finished, we signal to the queue this task has completed. Otherwise, we wait for completion events being emitted.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
var queue = async.queue(function (task, callback) {
  var done = function (message) {
    if (message) this.channels[channel].bot.post(':speech_balloon: ' + message)
    callback()
    return true
  }

  var process = function () {
    return done(task.transcript)
  }

  var failed = function () {
    return done(this.channels[channel].phone.defaults.verbose ? '_unable to recognise speech_' : '')
  }

  if (task.transcript && process()) return
  if (task.failed && failed()) return

  task.on('available', process)
  task.on('failed', failed)
}, 1)

Deploying Phonebot

Now we’ve finished the code, we can configure the application to deploy on the IBM Bluemix cloud platform.

Configuring Webhooks

Phonebot must be passed the configured incoming webhooks URLs, allowing it to send channel messages. Following the standard Platform-as-a-Service convention for passing configuration, we store the channel webhooks as environment variables.

Using the CF CLI, we run the following command to set up the local environment parameters.

1
$ cf cups slack_webhooks -p '{"channel_name":"incoming_webhook_url",...}'

Application Manifest

Application manifests configure deployment parameters for Cloud Foundry applications. Phonebot will need to be bound to Twilio, IBM watson and custom services, along with configuring the runtime environment.

---
applications:
- name: phonebot 
  memory: 256M 
  command: node app.js
  buildpack: https://github.com/jthomas/nodejs-buildpack.git
  services:
  - twilio
  - speech_to_text
  - slack_webhooks
declared-services:
  twilio:
    label: Twilio
    plan: 'user-provided'
  twilio:
    label: slack_webhooks
    plan: 'user-provided'
  speech_to_text:
    label: speech_to_text
    plan: free

…with this manifest, we can just use the cf push command to deploy our application!

Using Phonebot

Phonebot will post the following message to each channel successfully registered on startup.

Users can issue @phonebot COMMAND messages to control phone calls directly from the slack channel.

For further information about the project, follow the project on Github. Upcoming features are listed in the issues page. Please feel free to ask for new features, report bugs and leave feedback on Github.

Comments