Guidelines for effective NLP training
The natural language understanding capabilities of bots can be influenced in different ways. On the one hand, there is the possibility to provide user utterances for certain user intents. On the other hand, uninterpreted or incorrectly interpreted user inputs can be added as training data directly from the inbox. While both approaches are suitable for improving natural language understanding, they should be used in a targeted manner. This guide is intended to show how exactly high-quality training of Mercury.ai bots can be achieved.
Here are the seven main principles you should keep in mind:
- Specify enough variation.
- Include approximately ten user utterances per intent.
- All intents should have roughly the same number of utterances.
- Do not use filler words.
- Avoid one-word utterances, use them only in edge cases.
- Define common utterances through callbacks.
- Use annotations to refine the understanding.
In the following, we will explain each of them in more detail.
NLP training has two main goals: to distinguish user utterances expressing one intent from user utterances expressing other intents, and to generalize from known user utterances to unknown ones. In order to help the NLP model achieve these goals, it is important to provide enough variation in the training data - both in terms of words and in terms of sentence structure.
Good: high variation
- How do I cancel my insurance?
- Can I cancel the contract?
- Information on withdrawing from contract
- What options do I have to cancel a policy?
Bad: low variation
- How do I cancel the insurance?
- How do I cancel the contract?
- How do I withdraw the insurance?
- How do I withdraw the contract?
The bad example shows what low variation means: The sentence structure is the same in all sentences and the choice of words is very similar. In the good example, on the other hand, different terms and sentence structures were used, which helps in generalizing beyond those know examples, recognizing the intent even if there are deviations in wording. A large number of low-variance training examples biases the NLP model to look for exact matches only.
It is not so much the quantity but the quality of training examples that decide how well the resulting NLP model performs (see 1). Still, the NLP model needs enough training data to learn at all. In general, about 10 different training examples are advisable. It is also possible to specify fewer training examples, but more utterances should only be provided in exceptional cases.
Good: nine user utterances
- Good morning
- Good evening
Bad: two user utterances
- How are you
- What's up
Very large imbalances in the number of training examples per intent can cause those user intents with a high number of utterances to be detected more frequently than others - even in wrong situations. Intents with few training examples, on the other hand, are only recognized if there is an exact match between user input and training examples.
In order to not bias the interpretation towards one intent or the other, make sure that all intents have roughly the same amount of training data.
When talking about variance in training data, we mean variance in terms that are actually indicative of the intent. Variance that does not help much is in filler words, i.e. words that are often added in conversations but that do not express meaning with respect to the actual intent. Only if filler words have an important function for the meaning of a sentence and cannot simply be replaced by other words, then it makes sense to include them in a training example.
Good: no fillers
- How does it work?
- Can you explain how to use it?
- How can I use it?
- What do I need to do?
Bad: superfluous fillers
- How does this actually work?
- How can I use it anyway?
- Can you please explain how to possibly use it?
- Now, really, come on, what do I need to do here?
You should use one-word utterances only if it is clear and unambiguous that the chosen term refers to this particular intent. Otherwise it undermines the NLP model's ability to differentiate between intents, as wording and sentence constructions can no longer be distinguished from one another.
There are a number of general user intents that you want to use in many places in the bot, e.g. "Yes", "No", "Thank you", "Never mind" etc. Their meaning is often very context-dependent: "Yes" as an answer to the question whether you want to start should trigger a different bot reaction than "Yes" as an answer to the question whether a user wants to learn more about a product. In order to map these expressions dynamically to the contextually matching intents in the dialog, there are predefined hooks intents and callbacks. The expressions are pre-trained and therefore do not have to be included in the training data.
User utterances are initial training data that are assigned to intents in the development process. During the life cycle of a bot, not or incorrectly understood user input can be annotated directly in the inbox. This avoids adding clutter to the user utterances.
However, it is important that annotations are consistent and free of errors. Otherwise the performance of the NLP model will suffer. Similar user inputs should always be linked to the same intent. Inconsistencies can arise especially when several platform users annotate. Annotations can be checked in the training data tool and deleted if there are incorrect entries.