Cognitive Bots With IBM Watson

May 10, 2016
watson bots bluemix
7 min read

Later this month, I’m speaking at Twilio’s conference about building cognitive bots with IBM Watson. Preparing for this presentation, I’ve been experimenting with the IBM Watson services to build sample bots that can understand, and act on, natural language.

IBM’s artificial intelligence system, [Watson](https://en.wikipedia.org/wiki/Watson_(computer), now provides a series of “cognitive” services available through IBM’s Bluemix cloud platform. Developers can integrate everything from natural language processing, image and speech recognition, emotion analysis and more into their applications using RESTful APIs.

The Watson Developer Cloud site has numerous sample apps to help you understand how to integrate the services together to build “cognitive” bots.

In one of the samples, the Dialog service is used to develop a pizza ordering bot. Users can order a pizza, specifying the size, toppings and delivery method, using natural language.

After understanding how this sample worked, I had an idea to enhance it with the tone analysis service…

Where the heck is my pizza?

Let’s imagine the customer has ordered a delivery using pizza-bot and the driver is being (even) slower than normal.

If the customer asks

“Where is my pizza?"

We return the standard message all pizza takeaways use when calling to inquire where the driver is….

“The driver has just left, he’ll be ten minutes."

An hour later…

When the driver still hasn’t arrived, the customer would probably ask again and with a bit less civility…

“Where the heck is my pizza? I ordered an hour ago! This is ridiculous."

At this point, the “just ten minutes” reply is not going to be well received!

Building bots that can understand conversation tone will mean we can script a suitable response, rather than infuriating our hungry customers.

Using the tone analyser service, I wanted to enhance the sample to use conversation sentiment to affect the dialogue. Bot responses should be generated based upon both user input and conversation sentiment.

Let’s review both services before looking at how to combine them to create the improved pizza bot…

IBM Watson Dialog

The IBM Watson Dialog service enables a developer to automate scripting conversations, using natural language, between a virtual agent and a user. Developers build up a decision tree for dialogue, using a markup language to define the conversation paths.

Developers can then utilise the pre-defined linguistic model to converse with users. The system will keep track of the conversation state when processing user input to generate a suitable response. It can also store conversation properties, either extracted from user input or manually updated through the API.

These conversation properties can be used to control the dialogue branching.

Documentation on the service is available here.

IBM Watson Tone Analyser

The IBM Watson Tone Analyzer Service uses linguistic analysis to detect three types of tones from text: emotion, social tendencies, and language style.

Emotions identified include things like anger, fear, joy, sadness, and disgust. Identified social tendencies include things from the Big Five personality traits used by some psychologists. These include openness, conscientiousness, extroversion, agreeableness, and emotional range. Identified language styles include confident, analytical, and tentative.

Documentation on the service is available here.

Extending Pizza Bot

Enhancing pizza bot to support dialogue about delivery times, we can start by identifying when the user is asking about the pizza delivery. At this point, unless the user is angry, we can return the default response. When sentiment analysis indicates this user is angry, we should branch to returning a more sympathetic message.

Matching User Input

Matching user input about delivery times, there a few common questions we want to capture.

Where’s my order?
How long will it be until my pizza arrives?
When will my takeout get here?

Creating our new conversation branch within a folder element will allow us to group the necessary input, grammar and output elements as a logical section.

<folder label="Order">
  <input>
    <grammar>
      ...
    </grammar>
    <output>
      <prompt selectionType="RANDOM">
        ...
      </prompt>
    </output>
  </input>
</folder>

This structure will process the output element, to generate the bot reply, only if the input grammar matches user input. Adding item nodes under the input’s grammar element will let us define the dialogue matching criteria, shown here.

<grammar>
  <item>$where* order</item>
  <item>$where* pizza</item>
  <item>$how long* order</item>
  <item>$how long* pizza</item>
  <item>$when * order * here</item>
  <item>$when * pizza * here</item>
</grammar>

Using wildcard matching characters, $ and *, means the grammar (“$where * order”) will match questions including “Where is my pizza?” and “Where’s my pizza?” rather than having to manually define every permutation.

People often use synonyms in natural language. Rather than manually defining grammar rules for all alternative words for pizza and order, we can add concept elements to automatically match these. The sample already has a concept element defined for the pizza term, we only have to add elements for order.

<concept>
  <grammar>
    <item>Order</item>
    <item>Takeaway</item>
    <item>Takeout</item>
    <item>Delivery</item>
  </grammar>
</concept>

Grammar rules which include the order term which automatically match takeaway, takeout or delivery.

Adding Default Response

Having matched the user input, we want to return the default response from a pre-specified list.

<output>
  <prompt selectionType="RANDOM">
    <item>I've just checked and the driver is ten minutes away, is there anything else I can help with?</item>
    <item>Hmmm the driver's running a bit late, they'll be about ten minutes. Is there anything else I can help with?</item>
    <item>They should be with you in ten minutes. Is there anything else I can help with?</item>
  </prompt>
  <goto ref="getUserInput_2442994"/>
</output>

Handling Angry Customers

Within the dialog markup, profile variables can be defined to store conversation entities. These variables can be referenced by conditional branches in the markup to control responses.

Defining a new profile variable for the anger score, this value can be updated manually before the current user input is processed to return the dialogue response.

<variables>
  <var_folder name="Home">
    ...
    <var name="anger" type="NUMBER" initValue="0" description="Anger emotion score for conversation."/>
  </var_folder>
</variables>

Adding a child branch, for the conditional response, after the input grammar will allow us to return a custom response if the profile variable for the anger emotion is above a threshold.

<folder label="Order">
  <input>
    <grammar>
      <item>$where* order</item>
    </grammar>
    <if matchType="ANY">
      <cond varName="anger" operator="GREATER_THEN">0.50</cond>
      <output>
        <prompt selectionType="RANDOM">
          <item>Please accept our apologies for the delivery driver being very late. Could you call us on 0800 800 800 and we'll get this fixed?</item>
        </prompt>
      </output>
    </if>

When we’ve detected the user is angry about the delivery delay, we direct them to ring the restaurant to find out what’s happened to the driver.

Combining Watson Services

Modifying the backend service that calls the Watson services, we’re now passing the user’s input through the Tone Analyzer service and manually updating user’s anger score in their profile, before calling the Dialog service.

This anger score will be used to control the dialogue response in real-time.

app.post('/conversation', function(req, res, next) {
  tone_analyzer.tone({ text: req.body.input }, function(err, tone) {
    var categories = tone.document_tone.tone_categories
    var emotion_tones = categories.find(function (tone) {
      return tone.category_id === 'emotion_tone'
    })

    var anger_tone = emotion_tones.tones.find(function (tone) {
      return tone.tone_id === 'anger'
    })

    var params = {client_id: req.body.client_id, dialog_id: dialog_id, name_values: [{name: 'anger', value: anger_tone.score}]}
    dialog.updateProfile(params, function (err, results) {
      var params = extend({ dialog_id: dialog_id }, req.body);
      dialog.conversation(params, function(err, results) {
        else
          res.json({ dialog_id: dialog_id, conversation: results});
      });
    })
  });
});

The commit log for the fork shows the full changes needed to integrate this feature.

Conclusion

Bots are a huge trend for 2016. One of the major challenges to developing your own bots is handling user input using natural language. How can you go beyond simple keyword matching and regular expressions to build solutions that actually understand what your user is asking?

Using the IBM Watson Dialog service users can script natural language conversations. Defining a linguistic model for their dialogue using markup language, the system can use this to process natural language and return the appropriate response. Conversation entities are recognised and stored in a user profile.

Combining this service with the IBM Watson Tone Analyzer, users can script conversations that use the user’s emotional tone to modify the response.

Modifying the pizza sample, we incorporate the anger score to return a more appropriate response when the user is angry about their delivery being delayed.

IBM Watson has many other services that can be integrated with the Dialog service using the same pattern to build “cognitive” bots. Using these services takes the hard work out of building bots that actually understand and respond with emotion to input using natural language.