Experience Training

Author: Max Davish, Product Manager
Product: Answers
Blog Date: November 2021

A New Way to Collect Training Data at Scale

You can't build a modern search engine without using machine learning. Or at least you shouldn't. Over the past ten years, advances in machine learning — and in particular deep learning and transformer neural networks — have enabled us to solve problems in search that were previously unsolvable. New models like BERT allow us to understand search queries in a deeper, more nuanced way than older, keyword-based approaches.

As with most things in search, Google has been the pioneer in machine-learning approaches to search. Soon after their BERT breakthrough in 2018, they implemented BERT in their search algorithm, calling it "one of the biggest leaps forward in the history of search."

Yext Answers uses BERT, along with many other similar transformer models, in many of the same ways Google does. A few of Yext's use cases include…

  • Disambiguating named entities, like "Edward Norton" and "Edward, North Carolina"
  • Producing featured snippets with question-answering models
  • Encoding semantic vectors to perform semantic search

All of these are supervised machine learning algorithms, which means they need labeled training data (and lots of it) to work. For example, in order to build a question-answering model optimized for search, we need tens of thousands of question/answer pairs to train the neural network.

This is both the fundamental strength and weakness of supervised machine learning: the algorithm can implicitly learn from labeled examples, without the need to encode rules or heuristics. But the algorithm needs a huge number of these examples, which can be challenging to produce.

As Michael Misiewicz, Yext's Head of Data Science, described in his recent post, we have a sophisticated apparatus for data labeling at Yext, consisting of a multilingual labeling team, proprietary labeling software, and thorough quality control guidelines. But we can only do so much labeling ourselves.

We also wanted to give Yext administrators the ability to train the Answers algorithm themselves, by giving direct feedback on our algorithms' predictions. To that end, we developed our Experience Training framework.

Experience Training allows admins to give direct feedback on predictions made by the various supervised ML models in Yext Answers. For example, admins can approve or reject the Featured Snippets produced by our question-answering model.

When an admin provides feedback, two things happen:

  1. The Answer algorithm immediately modifies its prediction via an override layer.
  2. The admin's feedback enters into a training pipeline and becomes training data for future versions of the model.

Therefore, the admin's feedback takes effect immediately, while also being incorporated asynchronously into our continuous model retraining pipeline. Let's dive deeper into each of these aspects.

The Override Layer

The override layer ensures that the Answers Algorithm always prioritizes admin feedback over the algorithm's native predictions. For example, if an admin has rejected a featured snippet in Experience Training, it will immediately stop showing up. If they've modified it, the modification takes effect immediately. If they've approved it, it continues to show up, even if subsequent, retrained versions of the model no longer render the same prediction.

When the algorithm requests a prediction from any ML model, it first consults the override layer. If an admin has provided explicit feedback on a prediction in the past, then we simply surface their preferred prediction and disregard what the underlying algorithm predicted.

Let's walk through an example. Imagine the algorithm has predicted an incorrect snippet (which happens from time to time, even on Google.)

The algorithm here mistakenly predicts that Joe Biden became president from "2009 to 2017," which is obviously wrong. To fix this misprediction, an admin could proceed to the Training section and modify the prediction to the correct span of text — "January 20, 2021". (They could also remove the snippet altogether, if there were no correct answers in the text.)

As soon as the admin applies this change, the algorithm instantly reacts to their feedback and begins rendering the correct prediction.

But, importantly, the model has not been retrained yet. At Yext we retrain many of our models on a nightly basis — and we're getting faster all the time — but we can't retrain our models every single time an admin gives us feedback. But we also can't ask admins to wait a few hours or days for their feedback to take effect while we retrain the model.

That's where the override layer comes in. Before the algorithm serves a prediction, it consults the override layer to check whether an admin has given feedback on a particular prediction and, if they have, serves that prediction. In this case, an admin has given feedback on this prediction, so we discard our question-answering model's prediction in favor of the admin's feedback.

It's important to understand that each override is scoped to a specific query in a specific experience. Overrides in one account don't affect overrides in another account. And overrides only apply to one search term at a time. For example, if you were to rephrase this question as "when was joe biden inaugurated", it would bypass the override layer altogether and proceed to the question-answering model itself.

Fortunately, experience training also makes the model itself smarter so it can surface better answers to similar queries in the future.

The Training Pipeline

The second — and ultimately more important — outcome of admin feedback is to provide more, richer training data for our ML models. While the override layer acts mostly as a way to quickly spot-fix incorrect predictions, by using this feedback as training data we are able to generalize our learnings and fix not only the query in question but also other queries like it.

Going back to the example above, through this training example the algorithm would not only learn the right answer to "when did joe biden become president" but also to other similar questions, like…

  • "when did abe lincoln become president?"
  • "when was joe biden inaugurated?"
  • "when did nixon leave office?"

The more examples the algorithm has seen — particularly within a given domain — the better it will perform.

How does this actually work? Continuous model retraining and evaluation is one of the biggest challenges in machine learning. It requires sophisticated infrastructure for juggling multiple models in production, often serving different customer segments and different geographies, as well as robust evaluation tools for evaluating each individual model upgrade as quickly as possible.

There are several key systems that help us bring new models into production as quickly as possible, while minimizing errors:

  • Automated Retraining Pipelines: At Yext we have a sophisticated system for continuously retraining our models on the new labeled data generated by Experience Training and other sources. Most key models are retrained at least on a daily basis, and performance metrics are automatically computed to detect regressions or model drift.
  • Gradual Deployments: When a model is ready for deployment, we slowly activate it for a larger and larger percentage of our traffic. Starting small and gradually expanding to our entire customer base gives us an opportunity to catch unforeseen regressions before they become big problems.
  • Model Management Platform: At any given time, we have dozens of models performing various different tasks in different languages across different geographies. This is enormously difficult, so we created a model management platform that allows data scientists and engineers to easily see which models are available and serving in which regions, and to deploy, re-deploy, and shut them down as necessary without interrupting real traffic.

Having an efficient, error-proof system for model deployment allows us to turn our training data into performance improvements as quickly as possible. It expedites the virtuous feedback cycle that makes our algorithm smarter and better able to answer user's questions — which is, after all, the whole goal of Experience Training!

All Blog Posts