Overview of the Algorithm & How Indexing Works| Hitchhikers Platform
What You’ll Learn
In this section, you will learn:
- How do the algorithms work?
- What are the three Search algorithms?
- What elements are controlled by the algorithms?
- What are different sources of data for the algorithms?
What are the Search Algorithms and how do they work?
You’ll hear us referencing the word ‘algorithm’ when we’re describing different elements of Yext Search. Don’t be frightened! An algorithm is simply something that takes a series of inputs, conducts a sequence of actions, and then returns outputs. We take a multi-algorithm approach with Search and have three separate algorithms today.
In our case, the Search algorithms takes a series of inputs, such as:
- User Inputted Query
- User Location
It then uses:
- Natural Language Processing to understand how those inputs map to specific intents
- Client’s Search Configuration to know how to treat each of those intents & any business logic
- Client’s Knowledge Graph to map those intents to specific entities in Yext
Given the scenario, the algorithm can output multiple things:
- Query Suggestions based off of what a user has typed (autocomplete)
- List of entities that match the query
- Featured Snippets and Direct Answers
- Matched Search Terms and Detected Filters
- Suggested Spellcheck
- Detected Location of the User
We’ll dive into many of these topics in depth in future modules, but it’s important to understand that all of these components are controlled by the algorithms.
What are the three Search algorithms?
Yext takes a multi-algorithm strategy with search that focuses on natural language. There isn’t a single perfect search algorithm—that’s why Yext Search has three. Rather than keyword-based search, Yext Search uses a multi-algorithm approach to surface the best results, similar to how the top consumer search engines work. We have an algorithm for three different types of data, all of which can be loaded into a Yext Knowledge Graph: structured data, semi-structured data, and unstructured data. Let’s talk a bit about each algorithm:
Named Entity Recognition: Search for Structured Data
Yext Search uses Named Entity Recognition—based on Google’s open source machine learning framework BERT—to detect potential filters and show structured results from a Knowledge Graph. This works great for structured entities like products, events, and jobs.
To learn more about searching structured data, check out our algo pages on Yext.com.
Semantic Text Search: Search for Semi-Structured Data
Yext Search uses Semantic Text Search for FAQs and Help Article names. This content is more loosely structured than entities like products, events, or jobs. Instead of relying on keywords, we embed the search query and FAQ or Help Article names in vector space and use an algorithm to determine the most relevant FAQ or Help Article. Our Semantic Text Search algorithm is able to identify FAQs and Help Articles that are similar in meaning to the user’s question. For example, we’ll identify that a query of “how is covid trasmitted?” is semantically similar to “how does the virus spread?”. No synonyms required!
To learn more about searching semi-structured data, check out our algo pages on Yext.com.
Document Search: Search for Unstructured Data
Yext Search can search unstructured data to identify the most relevant documents. With Document Search (also known as Extractive QA), you can crawl, index, and search through blog posts, help articles, and product manuals and extract relevant snippets that answer the query posed.
To learn more about searching unstructured data, check out our algo pages on Yext.com.
Putting it all together, here are the three Search algorithms for the three data types in a Knowledge Graph:
- Structured Knowledge Graph Data -> Named Entity Recognition Algorithm
- Semi-Structured FAQs and Help Article Names -> Semantic Text Search Algorithm
- Unstructured Data -> Document Search Algorithm
Indexing the Knowledge Graph
In order to surface Knowledge Graph results for a query, that content must be indexed in order to match query intent with the corresponding data. Note that this is not a simple index of links from keywords; rather, we are indexing the content associated with each entity in a way that we can specifically search on those attributes.
All entities, fields, and field values in your Knowledge Graph will be stored in this index. Any updates made to the Knowledge Graph will trigger indexing and be updated in near-real time.
Where do the results data come from?
The majority of our vertical experiences will be powered through Knowledge Graph. With the Knowledge Graph as the primary source of information, we can:
- Define discrete fields that can be searched & how they are searched (backend)
- Structure the data for the results card (frontend)
However, we do have the ability to integrate with a third party to return content such as the link results, which we don’t want to store in the Knowledge Graph. We have a few pre-built integrations for Third Party Verticals we can offer to our clients. You can see a list of the built-in Third Party Verticals we offer in the Search Overview module.
You also have the option to build your own custom Third Party Vertical. All we need is an API endpoint that accepts a query and returns a list of results. We send the raw query itself entered by the user, and render the results as they’re returned from the third party endpoint.
In summary - for Knowledge Graph Verticals, Yext Search determines the results based on the Search configuration. For Third Party Verticals, the third party provider decides the results.
What’s Controlled By the Algorithms Besides Results?
You may think that the only thing the algorithms impact are the results that are returned after a user submits a query. However, there are a few other helpful components controlled by the algorithms you should know about!
You already learned about how these results are determined and returned in the Query Suggestions module. Every time you interact with an Yext Search search bar, a request is sent to the Search API.
When you first click into a search bar, an empty request is sent, and the API returns hardcoded prompts. As a user starts typing, we will see queries that begin with the search term entered, known as our popular queries.
For a given query, the Search API is able to return spellcheck corrections to your query. Clicking on the suggestion re-runs the search with that spelling.
Each client has a separate spellchecking dictionary made up of:
- A generic dictionary for the supported language
- Historical Search Queries
- Content in the Knowledge Graph
You can train the algorithm’s spell checking per experience in Experience Training (Search > All Search Experiences > View Experience > Spell Checking). On this screen, you can accept or reject any corrections applied by the algorithm. If you reject a correction, the algorithm will no longer apply that correction for a given search term for your experience.