Product Update: Improved Location Search via BERT

TL;DR Location Search is now smarter in English. If you see anything not working please fill out this form.

Background

Today, we rolled out a large Answers algorithm update to improve Location Search in English. This change is rolled out to all experiences. The point of this change is to better use the entire query to better understand if a user is talking about a place or not.

For example, take the following two queries:

Bank near Orlando

Orlando Bloom

Both of these queries include the token “Orlando”. However, in one, the user is clearly referring to the place Orlando while in the other, the user is referring to the person, Orlando. Classifying Orlando in the first one as a place and the second one as a name is called Named Entity Recognition (NER).

Even though both queries include the token Orlando - why is it so easy for a human to tell the difference? Well you aren’t looking at Orlando in isolation, you are looking at it in context. In the first one, basically any word that follows “Bank near” is going to be a place name. In the second one, seeing Orlando right next to Bloom makes it look like a person’s name.

Using BERT to improve NER

This is where BERT comes in. BERT (Bidirectional Encoder Representations from Transformers) is designed to learn the contextual relationship between words in a text. Using the approach outlined in BERT, we are able to drastically improve our ability to perform NER on text where the main hint to a human is how the word is used in context.

Previously, Answers would struggle with two queries like “Bank near Orlando” and “Orlando Bloom” and could occasionally cause a false positive on “Orlando” in the “Orlando Bloom” query. With this new approach, Answers knows that one is a location and one is a person and can handle the differences.

To do this, we manually labeled 72,916 search queries to teach “BERT” how to identify locations and how to identify tokens that are NOT locations. This model will continue to improve over time as we label more queries.

According to our analysis, a full 12.8% of queries will be improved by location detection (in absolute terms). For context, approximately 17.9% of queries submitted to Yext Answers concern location (BERT NER improved 72% of this set).

Example Improvements

The improvement is mainly focused on reducing the number of false positives but also will improve some cases of false negatives.

False Positives

Here are some examples of false positives that have been fixed. Previously these search terms were incorrectly marked as including locations when they do not.

  • Funeral notices for Jean irvine
  • Sandy close
  • i wish to settle an invoice
  • Billing and payment
  • adrian newman
  • Dr. French
  • Sarah Cate, MD
  • Providers who take Oxford Health

False Negatives

Here are some example queries where previously a location was not detected but should have been. Overall these are less common since our previous iteration had excessive recall.

  • The villages fl
  • bbva near orlando
  • chelsea obgyn

Regressions

Generally, the old approach was overzealous in detecting locations where no true location was intended by the user. Using BERT we are now much more precise, as BERT “understands” language better. However, this does mean the sort of regressions we expect are failing to detect a location when present. This is usually the result of queries using vocabulary which is unfamiliar to BERT, and submitting these regressions can help us improve the model going forward.

To help us continue to make this model better, please fill out this form with any regressions you find. We will constantly be improving our model and we will let you know when your issues are fixed. If you have a query that should find a location and it’s not OR you have a query that doesn’t include a location but it’s matching to one please let us know.

Using arbitrary text fields to match to locations

One final area that will cause issues is if the experience is using text fields for location search (as opposed to the builtin.location field). For example:

  • Custom Jobs entity with Location custom field
  • Entities with Single-Line Text or Text List Fields storing City Names, or other location names
  • Entities storing Neighborhood or Points of Interest fields

To fix these areas there are two options:

  • Easiest / Short Term: You can move the client to the monthly tier of the Answers Algorithm. This tier will not have the new location improvements, but it won’t require you to change anything in the graph. Once the update is out, you’ll be able to run a version comparison against queries to quantify the regressions.
  • Fix the underlying issue:
    • With Custom Jobs entities you should migrate them to the standard Jobs entities and use the new location field
    • Single-line text and text-list fields storing city names should no longer be necessary. With these improvements alongside improvements over the last 3 months these approaches shouldn’t be necessary. If you still see cases where they are please submit the form above.
    • We are working on improving Neighborhood and broader POI search so stay tuned for updates here. For now if these are important it makes the most sense to stay on the monthly tier.

FAQ

Does this improve queries of all lengths?

No, this improvement is only focused on queries that have at least one token. If there is only one token there isn’t anything to be learned from the context of that token.

Which languages does this apply to?

For now it’s English only but we will be rolling this out to other languages in the future.

What about using this approach for non-location entities?

Great Idea! We are continuing to explore ways we can use the approach outlined in BERT to improve search for other entities. Stay tuned for more updates over the coming months. Even by just focusing on location queries, we are able to improve non-location queries by avoiding false positives.