AI

We use the terms conversational assistants, AI assistants, chatbots, and bots interchangeably. What we actually always mean is intelligent conversational assistants, a term that captures our ambitions nicely: Conversational assistants steer through a conversation fluidly and naturally, they are helpful, they act like real dialog partners. Our goal is nothing less than lifting the state of the art from click bots to intelligent conversational assistants. That's why we put the AI in Mercury.ai.

But everyone is talking about AI nowadays. What actually makes a conversational assistant intelligent? Here is our approach to intelligent dialog behavior and continuous learning.

Intelligent dialog behavior

Intelligence is what makes a dialog feel natural and helpful. Putting this into practice is not simply about how much machine learning one employs. Instead, it is about whether dialog behavior displays the following properties:

  • It follows a goal. Messages are relevant and informative, and they steer the conversation towards a common goal.
  • It is contextualized, i.e. it takes into account the current state of the conversation. What are we talking about? What makes sense in the current situation and what doesn't? What do I already know about the user?
  • It is mixed-initiative. Initiative can be understood as who is controlling the interaction. In chatbot interactions we often see instances of either the user or the bot having control, while the other one just responds as triggered. What makes dialog effective and seamless, however, is mixed-initiative interaction, in which both the user and the bot can bring up topics and questions, and can switch between topics as they see fit. It makes bot and human a team.

In Mercury.ai bots, the way from user message to bot response involves roughly three components:

  1. The bot first processes the incoming user message and maps it to intents, formal representations of what the user wants to express. This is subsumed under the term natural language processing (NLP).
  2. The bot then decides on the appropriate reaction to those intents, the realm of dialog behavior.
  3. Finally it puts the reaction into specific bot messages; this is called natural language generation (NLG).

NLP determines how well a bot understands what you mean, while dialog behavior determines how intelligently it reacts. In order for dialog behavior to react intelligently, it needs to take into account more than just the user intents understood by NLP. In particular, it needs to involve:

  • The current state and the history of the conversation. Are we discussing a specific question? Are we talking about a particular product? Have I asked this before?
  • Knowledge about the user: Maybe the user already gave me information, so that some options or solutions are possible to pursue, while others are not.
  • The underlying datasource. If I want to assist the user in a product search, which question actually makes most sense to ask? If it does not lead to results or does not narrow down the search space, it doesn't help to ask it.

Learning dialog behavior

Sometimes dialog systems are conceptualized as end-to-end black boxes, with a machine learning algorithm learning a mapping from user messages to bot messages. In our view, this is too big a problem to swallow at once. And it is one we don't find very desirable either, as it leaves little options to fine-tune the internal workings. Instead, we decide to give customers full control over their bots, so the bots are always and predictably compliant with company policies in what they say, and stay in the voice of the brand in how they say it.

There is two places in the pipeline where we see machine learning as a powerful technology fit for the problem: natural language processing, and controlled learning from user feedback. We will say more about both of them in the next sections.

Machine learning for natural language processing

Natural language processing is one of the areas in which progress in machine learning technology has had a great impact in the past years. Our approach is a pragmatic compromise between good generalizability, high accuracy, and early bootstrapping behavior, leveraging a small number of training examples to induce a model.

We follows hybrid approach to natural language understanding (NLU), consisting of a symbolic strand based on pattern-based methods, and on a statistical strand that yields robustness and generalizes beyond the pre-defined cases handled by the rule-based component.

The symbolic NLU component uses language- and game-specific patterns for parsing user input. We have built a collection of basic patterns for each game type and each supported language. They cover common phrases applicable across projects and can then be adapted to the specific content and intents of a bot. They thus bootstrap the bot's natural language understanding capabilities and allow bots to come to confident and accurate interpretations of user input even before any training data is collected.

The statistical strand is based on machine learning models that have to accomplish two main learning tasks: intent recognition and slot filling. The parameters are optimized on our own benchmarking datasets from real customer projects. The machine learning approach complements the rule-based approach in adding robustness and an ability to generalize as well as to keep learning from real conversations throughout the lifetime of a bot.

At Mercury.ai, natural language processing is not synonymous with machine learning. It is an important driver in understanding user intents, but it cooperates with rule-based components, which play a legitimate role when it comes to aspects that are not probabilistic in nature. Reference resolution is one example: If the bot presents several products, the reference of expressions like "the first one" is deterministically given in the context. Also, if the conversation evolves around one specific product, it is clear that "it" in "how much does it cost" refers to that particular product.

Active learning

The notion of self-learning is for many people a crucial part of their understanding of "artificial intelligence". And it is true that the capacity to self-improve is an essential quality of intelligence. But in a business application, chatbots are still tools. For us, there is thus one additional aspect that is important: The direction of improvement always has to be the bot's capacity to achieve the outcomes it was set up for. Therefore any self-learning functionality of bots on our platform must meet two key criteria:

  • It must serve the user in the conversation.
  • It must give the business user full control over what the bot will learn.

One functionality that fulfills these criteria is active learning, a special case of machine learning, that lets users become teachers for bots. For example, when the bot is unsure about a user's intent, it will ask the user for guidance. By proposing "best guess" understandings of the message and letting the user choose, the bot opens up a possible route for the user to pursue their conversational goal, while at the same time engaging the user as a teacher.

The result of that interaction is new training data, which is lined up for approval by a business user on the Mercury.ai platform. Only after a successful review will the data be used in the bot training. This allows business users to keep full control, even in a self-learning scenario.