I had an inquiry about how the algorithm handles misspelled queries. I understand the following (see below for my proposed response) but would like to both confirm that my understanding is true and get more information about the logic, if possible:
Right now, our algorithm digests misspelled words and if it recognizes it as a common misspelling and has results for the correctly spelled word, those results will appear.
If the algorithm recognizes the query as a possible misspelled word but isn’t very confident in the correctly spelled word, the experience will surface a spell check prompt where the user can choose the recommended spelling of the word and the Answers experience will refresh with that query.
In other instances, the algorithm does not recognize the word and will request that the user check their spelling and try again. In these cases, the Yext team can include synonyms in the backend to force the first behavior (when the word is misspelled, it automatically performs the query with the intended spelling).
If we are seeing a high-queried word/phrase being commonly misspelled, our team will include these as synonyms proactively.
Thanks for any guidance on how to address this!
There are three ways a misspelled or typo could be handled by Answers.
Every Answers experience automatically has spellcheck built-in. Spellcheck uses a “dictionary” that contains common words found across the internet. It combines that with data it finds in the Knowledge Graph to automatically correct spelling. Importantly, it will only include data from searchable fields in the Knowledge Graph.
2. Typo Tolerance
Certain algorithms inside of Yext will support some level of typo tolerance. For example, if a user searches for
backgamon and the entity
backgammon exists in the KG, it might still match. The level of typo tolerance depends on the length of the work (token) and the type of algorithm. Currently nlpFilter and textSearch incorporate typo tolerance. With typo tolerance even if the spellchecker fails, the correct result might still show up.
3. Semantic Understanding
Finally, our new Semantic Text Search doesn’t rely on keyword matches and instead looks at semantic intent. This type of algorithm often automatically handles common misspellings and typos. The general rule of thumb is if a human could easily understand what the user is looking for then semantic understanding should be able to understand as well. Note that this algo only works in English on FAQ Name (the question) right now but we are exploring rolling this out to more fields.
Stemming - Some of our algos (e.g. textSearch) will automatically stem words to better match. For example, if you search for
fishing and an entity has the word
fish it will match even though it’s not the exact same word.
Location Detection - Yext has a built-in dictionary for identifying place names. This dictionary includes some common misspelling and acronyms specifically for place names. For example,
New York City.
Let me know if that helps,
Thanks Max for that explanation!
I have a healthcare client who’s seeing a ton of traffic on their Answers experience specifically geared towards providers and I’m wondering if we have any recommendations for searches that are similar to people names but are not currently yielding Knowledge Graph results (i.e. searching for “mac” does not show Dr. Mak). I have first, last and full names set up with Text Search and Phrase Match per this community post and since we do see a decent volume of misspellings come in, I’ve started adding them as keywords to the relevant entities so that they appear (but this is unfortunately overriding other entities i.e. adding “maria” as a keyword for someone’s alternative name filters out other providers whose names are also Maria). Do you have any recommendations here?
In general, we will try to catch misspellings of names with our spell checker. Our spell checking algorithm is unique in that it heavily considers the data in your Knowledge Graph, so in general it should be able to catch misspellings of the names and other words that appear in the graph.
This doesn’t always work, since sometimes the algorithm isn’t statistically certain that something is a misspelling. For example “mac” is a real word as well, which is likely why it isn’t offering a spelling correction to “mak”. We also don’t apply typo tolerance to very short words like this one.
Can you some other examples of queries that aren’t working? Are there any other patterns you’re seeing?
Thanks for that context! You’re right that a lot of the name queries producing zero KG results seem to be real words, like “pop” (provider named Popp) and “trap” (provider named Trapp). Adding [[name]] and [[lastname]] to the vertical prompts has been a huge help to reducing spelling errors among users.
That said, I’m also seeing some pretty obvious misspellings like “ashgar” for Asghar and “lange” for Langer that are also showing no results. Is it better to let the algorithm learn these on its own or try to intervene?
I think my main question is what is our best practice for adding alternative names, for instance when people change their last names or are known as another name. I have a Catalina Merrick who isn’t appearing in results for “maria merrick” which she also goes by, and a Diane Mak whose surname was formerly Aw. These are currently being added to keywords for which I have an NLP filter set, but would a textsearch make more sense to avoid filtering out other providers?
I would recommend treating “alternate names” such as maiden names or nicknames the same way you’d treat regular names - by using phrase match on the individual words. So you might consider adding an alternate names text-list field and searching it with phrase match. (I wouldn’t recommend NLP filter because it can be a bit too restrictive.)
Regarding the spell check issue, this is something you will soon be able to solve with experience training. In our most recent monthly release, we released Experience Training for Spell Checking, giving you the ability to reject invalid spelling corrections. In the summer release, you will also be able to do the reverse - i.e. enforcing a spelling correction where there originally was none.
Hope this helps!