loading

Tokens and Results Ranking| Hitchhikers Platform

What You’ll Learn

In this section, you will learn:

  • What are tokens of a query
  • Logic behind how we rank on relevance

light bulb
New Release Feature
This unit references a new bucketed distance update from our September ‘21 Monthly release. Test out the new feature in Hitchhikers and Playground Accounts. To turn on this beta feature (“Winter ‘21: Answers Distance Bucketing”) in your Production account, you can fill out this form here.

Tokens

You might be wondering how we break down a natural language query into the filters derived. We’re able to do this through our tokenization of queries.

Tokenization is breaking down a query into discrete units - aka, words! Tokens are used to determine the candidates for matching to searchable fields.

To derive tokens, we’ll:

  • Split out individual words based on white space
  • Strip out casing & punctuation
  • Ignore common words (called ‘stop’ words) that do not add meaning to the query.

For a query on Yext.com like “What are your products?” the token derived here would be ‘products’. The candidates for token matches might be an Entity Type of ‘Product’.

How Synonyms Impact Tokens

Another important concept is synonyms. These allow us to translate the tokens to different variations that mean the same thing. For example, although we might have ‘jobs’ as an entity type in the platform, we might want the same results to show for ‘careers’, ‘positions’, or ‘vacancies’. These synonyms can be defined for each Answers experience via the config. If you need a reminder on what synonyms are, you can revisit Core Configuration - Synonyms module.

How Stop Words Impact Tokens

As noted above, to derive tokens, our algorithm will account for “stop words”. These are words that the Answers algorithm treats differently to deliver more accurate results to the user. The Answers algorithm already has a built-in list of stop words that streamlines a query at the time of the search. This list includes words such as of, the, in, etc. These words can distract from the important tokens of a query (e.g. “the best bankers in the tri-state area” becomes “best bankers tri-state area” focusing on the entity, the rating, and the location)

As you learned in the Configuration Controls unit earlier, you can also set additional stop words, and it’s important to know how the Answers algorithm actually treats these stop words at the time of search. Here is how stop words interact with textSearch and nlpFilter:

  • textSearch - stop words are given a much smaller weight than other words in a query when matching on text search fields. This means that while they’re not completely ignored, stop words have a smaller and smaller effect as query length increases
  • nlpFilter - stop words can still be matched in filters, but the Answers algorithm will not match a filter that only matches stop words. For example, if you have a filter for “Cancer Care” and a stop word for “care”, the query “Cancer Care” will pull this filter. However, if you search for “Urgent Care”, it will not match on this filter.

Now let’s put it all together using the example query on Yext.com from: “What are your products?”. We can observe how the Search Factors parse the query when “what”, “are”, and “your” are stop words and “products”, “services” and “offers” is a synonym set:

  • Original Query: What are your products?
  • Normalized: what are your products
  • Stop Words: what, are, your
  • Tokens: products
  • Synonyms: product → services, offers
  • Search Terms Evaluated:
    • what are your products
    • what are your services
    • what are your offers

Ranking Logic

Once the algorithm has found the most relevant results for the user’s Search Term, it must then decide how to rank those entities within the Vertical.

To do so, it considers the following elements:

Location Radius with Location Intent

If location intent is present in a query, we want to filter down the results to entities within an appropriate radius of that location.

Relevance

There are a number of components that determine relevance, but we’ll review two important concepts below.

Number of Matched Tokens

The number of token matches add to the ranking score.

For a query “How do I join hitchhikers?”.

An FAQ titled “Where can I sign up to join Yext Hitchhikers” has more token matches than an FAQ titled “What is the Yext Hitchhikers Program”, so would be ranked higher.

Number of Non-Token Matches

For a query “How is search changing?”.

An FAQ titled “How is search changing?” would rank higher than an FAQ “How is voice search changing?”, given the latter has more non-token matches.

Location Distance - Distance from User’s Location or Specified Location

Lastly, if an entity has a location associated, we will sort any entities based on their proximity to a user.

Note that the ranking can be overridden through custom sorting on the backend. For instance, if one location is 500 feet closer than another location, that hardly 500 feet matters to the user. At that point, results should instead be sorted by relevance. The Bucketed Distance feature converts a continuous distance value into a discrete range, making it easier to sort by both distance and relevance.

You’ll learn about this and other ways to add custom sorting in the Advanced Answers - Sorting Module.

unit Quiz
+20 points
Daily Quiz Streak Daily Quiz Streak: 0
Quiz Accuracy Streak Quiz Accuracy Streak: 0
    Error Success Question 1 of 3

    What best describes a token in the context of Answers?

    Error Success Question 2 of 3

    What influences ranking? Select all that apply.

    Error Success Question 3 of 3

    Are all words in a query candidates for token matching?

    Way to go, you passed! 🏁

    You've already completed this quiz, so you can't earn more points.You completed this quiz in 1 attempt and earned 0 points! Feel free to review your answers and move on when you're ready.
1st attempt
0 incorrect
Splash Loading