Searching Unstructured Data

Author: Max Davish, Associate Product Manager
Blog Date: May 2021

When we first started building Answers almost two years ago, Yext was full of structured data. As a location listings company, we stored huge amounts of information about restaurants, hotels, doctors, hospitals, financial advisors, and more. This type of data is highly structured in nature. Each entity's data is organized into a series of fields and values with a uniform schema. The data needs to be this way so that it can be sent to hundreds of publishers across the web in a consistent way.

As a result, the first type of data we learned how to search in Answers was structured data. To that end, we developed a family of algorithms like Named Entity Recognition, NLP Filtering, and FIeld Value Direct Answers that are designed to answer questions about structured data. This allowed us to provide an unparalleled search experience for things like people, places, and products.

As Answers grew, we quickly began to learn that not all data is neatly structured like this. Pretty soon, frequently asked questions became the most searched entity type in Answers, surpassing locations, healthcare professionals, and other more structured entities. Every business in the world needs to provide answers to their customers' frequently asked questions, no matter the industry.

FAQs aren't like structured data. They contain a lot of semantic information. The order of words matters a lot, and there are a million ways to ask the same question. And so we created a new algorithm, Semantic Text Search, that was designed to search this type of semi-structured data by measuring the similarity in meaning between two strings of text, instead of just looking at individual keywords. This new algorithm helped Answers understand users' questions better than ever before.

But what about unstructured data? A lot of data doesn't lend itself to structure at all. You can't easily take a long help article, or blog post, or Wikipedia page and turn it into structured (or even semi-structured) data. Most companies have a huge amount of this type of data, and almost all of them struggle to search it effectively. This frustrates both customers and employees, who have to waste valuable time digging through content to find the answers they're looking for.

Our customers needed a better way to efficiently search this type of data, so we built one.

Introducing Document Search

The challenge with searching long, unstructured documents is that it's not enough to just show the user the right result - you also have to show them which part of the result answers their question.

This is why Google doesn't just show lists of blue links anymore, they now show featured snippets that directly answer users' questions. Consider the following query:

When you ask "where did bill gates grow up", Google extracts the relevant answer and snippet from an underlying web page and delivers it to you right on the SERP. Why do this? After all, Google earns money from people clicking on links, which you don't need to if the answer is right there in the SERP.

Google does this because it provides a dramatically better user experience. It saves users the hassle of combing through long documents to find the answer they're looking for. Instead, Google lets AI do the work of reading the document and searching for the answer.

With Yext Answers' new document search feature, you can now offer your users that same experience on your very own website. Document search uses a natural language processing algorithm called extractive question answering to identify the passage or phrase from a long document that answers a user's question and return it as a featured snippet.

Like many other parts of the Answers algorithm, extractive QA uses transformer neural networks (like BERT) to parse language and identify the answers to users' questions. Because the algorithm is trained to understand language, it's able to answer questions on content it's never seen before.

Here's what Document Search looks like in Yext Answers:

This answers experience searches the unstructured Wikipedia biographies of all 47 US Presidents, and it's able to directly answer complicated questions like this one. Rather than forcing the user to read several paragraphs to find the answer she's looking for, Answers delivers it directly in the search results.

Providing direct answers to questions from unstructured content saves time and creates a dramatically better user experience. That's why Yext now uses document search on our own help site, and the difference is clear.

Before:

After:

Instead of showing our users a list of unhelpful links, we now show direct answers to their questions. This means fewer support tickets and happier customers.

With the addition of Document Search, Yext Answers now offers a powerful tool for searching any type of data regardless of structure. Whether your users are searching for the right doctor, asking questions about your return policy, or perusing your help articles - Yext has the answers.

Using Document Search

To activate document search on your Answers Experience, simply navigate to your search configuration and choose Document Search on any field that contains long-form text.

Or, in JSON if you prefer:

You can learn even more in our new Document Search module.

Frequently Asked Questions

How can I try out document search?

Try it out by building a US Presidents search experience like the one shown above. This guide uses a solution template that comes preloaded with the data you'll need.

What fields and entity types does document search work on?

Document search works on any field and any entity type.

What types of content does document search work with?

Although it can be activated on any field, document search works best on fields that contain long-form text content - ideally multiple paragraphs. You shouldn't activate document search on a Short Description field.

What are common use cases for document search?

A common industry use case for document search centers around customer support-related self-service options. Just as Yext does on our own help site, you can use document search to derive instant answers drawn directly from your business's FAQs, guides, tutorials, videos, ebooks, product manuals, and more -- allowing both your customers and agents to quickly find the answers they're looking for.

Be on the lookout for our new Workplace solution, where you can also apply document search to your company's intranet search!

Does document search work with other searchable field types?

Yes! Document search can be used in conjunction with any other searchable field type. You can only activate document search on one field per vertical.

How do I fix a featured snippet that's wrong?

Even Google gets featured snippets wrong from time to time. It happens. But with our new Experience Training feature, you can give the Answers algorithm feedback on the featured snippets it produces. On the Experience Training page, you'll see a list of featured snippets that have been shown in your experience.

What languages is this available in?

Currently, Document Search is only available in English, but we will be expanding support to other major languages in the future.

All Blog Posts