If you’re already a Hitchhiker, log in to access this content.
Overview of the Algorithm & How Indexing Works| Hitchhikers Platform
What You’ll Learn
In this section, you will learn:
- How do the algorithms work?
- What are the three Answers algorithms?
- What elements are controlled by the algorithms?
- What are different sources of data for the algorithms?
What are the Answers Algorithms and how do they work?
You’ll hear us referencing the word ‘algorithm’ when we’re describing different elements of Answers. Don’t be frightened! An algorithm is simply something that takes a series of inputs, conducts a sequence of actions, and then returns outputs. We take a multi-algorithm approach with Answers and have three separate algorithms today.
In our case, the Answers algorithms takes a series of inputs, such as:
- User Inputted Query
- User Location
It then uses:
- Natural Language Processing to understand how those inputs map to specific intents
- Client’s Search Configuration to know how to treat each of those intents & any business logic
- Client’s Knowledge Graph to map those intents to specific entities in Yext
Given the scenario, the algorithm can output multiple things:
- Query Suggestions based off of what a user has typed (autocomplete)
- List of entities that match the query
- Featured Snippets and Direct Answers
- Matched Search Terms and Detected Filters
- Suggested Spellcheck
- Detected Location of the User
We’ll dive into many of these topics in depth in future modules, but it’s important to understand that all of these components are controlled by the algorithms.
What are the three Answers algorithms?
Yext takes a multi-algorithm strategy with Answers that focuses on natural language. There isn’t a single perfect search algorithm—that’s why Answers has three. Rather than keyword-based search, Answers uses a multi-algorithm approach to surface the best results, similar to how the top consumer search engines work. We have an algorithm for three different types of data, all of which can be loaded into a Yext Knowledge Graph: structured data, semi-structured data, and unstructured data. Let’s talk a bit about each algorithm:
Named Entity Recognition: Search for Structured Data
Answers uses Named Entity Recognition—based on Google’s open source machine learning framework BERT—to detect potential filters and show structured results from a Knowledge Graph. This works great for structured entities like products, events, and jobs.
To learn more about searching structured data, check out our algo pages on Yext.com.
Semantic Text Search: Search for Semi-Structured Data
Answers uses Semantic Text Search for FAQs and Help Article names. This content is more loosely structured than entities like products, events, or jobs. Instead of relying on keywords, we embed the search query and FAQ or Help Article names in vector space and use an algorithm to determine the most relevant FAQ or Help Article. Our Semantic Text Search algorithm is able to identify FAQs and Help Articles that are similar in meaning to the user’s question. For example, we’ll identify that a query of “how is covid trasmitted?” is semantically similar to “how does the virus spread?”. No synonyms required!
To learn more about searching semi-structured data, check out our algo pages on Yext.com.
Document Search: Search for Unstructured Data
Yext Answers can search unstructured data to identify the most relevant documents. With Document Search (also known as Extractive QA), you can crawl, index, and search through blog posts, help articles, and product manuals and extract relevant snippets that answer the query posed.
To learn more about searching unstructured data, check out our algo pages on Yext.com.
Putting it all together, here are the three Answers algorithms for the three data types in a Knowledge Graph:
- Structured Knowledge Graph Data -> Named Entity Recognition Algorithm
- Semi-Structured FAQs and Help Article Names -> Semantic Text Search Algorithm
- Unstructured Data -> Document Search Algorithm
Indexing the Knowledge Graph
In order to surface Knowledge Graph results for a query, that content must be indexed in order to match query intent with the corresponding data. Note that this is not a simple index of links from keywords; rather, we are indexing the content associated with each entity in a way that we can specifically search on those attributes.
All entities, fields, and field values in your Knowledge Graph will be stored in this index. Any updates made to the Knowledge Graph will trigger indexing and be updated in near-real time.
Where do the results data come from?
The majority of our vertical experiences will be powered through Knowledge Graph. With the Knowledge Graph as the primary source of information, we can:
- Define discrete fields that can be searched & how they are searched (backend)
- Structure the data for the results card (frontend)
However, we do have the ability to integrate with a third party to return content such as the link results, which we don’t want to store in the Knowledge Graph. We have a few pre-built integrations for Third Party Verticals we can offer to our clients. You can see a list of the built-in Third Party Verticals we offer in the Answers Overview module.
You also have the option to build your own custom Third Party Vertical. All we need is an API endpoint that accepts a query and returns a list of results. We send the raw query itself entered by the user, and render the results as they’re returned from the third party endpoint.
In summary - for Knowledge Graph Verticals, Answers determines the results based on the Answers configuration. For Third Party Verticals, the third party provider decides the results.
What’s Controlled By the Algorithms Besides Results?
You may think that the only thing the algorithms impact are the results that are returned after a user submits a query. However, there are a few other helpful components controlled by the algorithms you should know about!
You already learned about how these results are determined and returned in the Query Suggestions module. Every time you interact with an Answers search bar, a request is sent to the Answers API.
When you first click into a search bar, an empty request is sent, and the API returns hardcoded prompts. As a user starts typing, we will see queries that begin with the search term entered, known as our popular queries.
For a given query, the Answers API is able to return spellcheck corrections to your query. Clicking on the suggestion re-runs the search with that spelling.
Each client has a separate spellchecking dictionary made up of:
- A generic dictionary for the supported language
- Historical Search Queries
- Content in the Knowledge Graph
You can train the algorithm’s spell checking per experience in Experience Training (Answers > All Answers Experiences > View Experience > Spell Checking). On this screen, you can accept or reject any corrections applied by the algorithm. If you reject a correction, the algorithm will no longer apply that correction for a given search term for your experience.