Tokens and Results Ranking| Hitchhikers Platform
What You’ll Learn
In this section, you will learn:
- What are tokens of a query
- Logic behind how we rank on relevance
You might be wondering how we break down a natural language query into the filters derived. We’re able to do this through our tokenization of queries.
Tokenization is breaking down a query into discrete units - aka, words! Tokens are used to determine the candidates for matching to searchable fields.
To derive tokens, we’ll:
- Split out individual words based on white space
- Strip out casing & punctuation
- Ignore common words (called ‘stop’ words) that do not add meaning to the query.
For a query on Yext.com like “What are your products?” the token derived here would be ‘products’. The candidates for token matches might be an Entity Type of ‘Product’.
How Synonyms Impact Tokens
Another important concept is synonyms. These allow us to translate the tokens to different variations that mean the same thing. For example, although we might have ‘jobs’ as an entity type in the platform, we might want the same results to show for ‘careers’, ‘positions’, or ‘vacancies’. These synonyms can be defined for each search experience via the config. If you need a reminder on what synonyms are, you can revisit Core Configuration - Synonyms module.
How Stop Words Impact Tokens
As noted above, to derive tokens, our algorithm will account for “stop words”. These are words that the Search algorithm treats differently to deliver more accurate results to the user. The Search algorithm already has a built-in list of stop words that streamlines a query at the time of the search. This list includes words such as
in, etc. These words can distract from the important tokens of a query (e.g. “the best bankers in the tri-state area” becomes “best bankers tri-state area” focusing on the entity, the rating, and the location)
As you learned in the Configuration Controls unit earlier, you can also set additional stop words, and it’s important to know how the Search algorithm actually treats these stop words at the time of search. Here is how stop words interact with textSearch and nlpFilter:
- textSearch - stop words are given a much smaller weight than other words in a query when matching on text search fields. This means that while they’re not completely ignored, stop words have a smaller and smaller effect as query length increases
- nlpFilter - stop words can still be matched in filters, but the Search algorithm will not match a filter that only matches stop words. For example, if you have a filter for “Cancer Care” and a stop word for “care”, the query “Cancer Care” will pull this filter. However, if you search for “Urgent Care”, it will not match on this filter.
Now let’s put it all together using the example query on Yext.com from: “What are your products?”. We can observe how the Search Factors parse the query when “what”, “are”, and “your” are stop words and “products”, “services” and “offers” is a synonym set:
- Original Query: What are your products?
- Normalized: what are your products
- Stop Words: what, are, your
- Tokens: products
- Synonyms: product → services, offers
- Search Terms Evaluated:
- what are your products
- what are your services
- what are your offers
Once the algorithm has found the most relevant results for the user’s Search Term, it must then decide how to rank those entities within the Vertical.
To do so, it considers the following elements:
Location Radius with Location Intent
If location intent is present in a query, we want to filter down the results to entities within an appropriate radius of that location.
There are a number of components that determine relevance, but we’ll review two important concepts below.
Number of Matched Tokens
The number of token matches add to the ranking score.
For a query “How do I join hitchhikers?”.
An FAQ titled “Where can I sign up to join Yext Hitchhikers” has more token matches than an FAQ titled “What is the Yext Hitchhikers Program”, so would be ranked higher.
Number of Non-Token Matches
For a query “How is search changing?”.
An FAQ titled “How is search changing?” would rank higher than an FAQ “How is voice search changing?”, given the latter has more non-token matches.
Location Distance - Distance from User’s Location or Specified Location
Lastly, if an entity has a location associated, we will sort any entities based on their proximity to a user.
Note that the ranking can be overridden through custom sorting on the backend. For instance, if one location is 500 feet closer than another location, that hardly 500 feet matters to the user. At that point, results should instead be sorted by relevance. The Bucketed Distance feature converts a continuous distance value into a discrete range, making it easier to sort by both distance and relevance.
You’ll learn about this and other ways to add custom sorting in the Advanced Search - Sorting Module.