Search Term Clusters

Author: Ariana Martino, Platform Product Marketing
Products: Answers, Analytics
Blog Date: November 2021

The value of search data

In an ideal customer journey, a customer searches for your brand, finds relevant content, and clicks to convert. Say they searched for "locations near me" on your website. They should see some structured results with details on your locations, including a prominent Call To Action, like "Get Directions" that they might click to ultimately convert with your brand. You'd want to aggregate information about their search experience in the form of search analytics.

Search analytics serve two main goals:

  1. Gathering customer intelligence — When users search, they are giving you invaluable customer intelligence about their wants, needs, and expectations for your brand.
  2. Closing content gaps — Analyzing your search data can reveal areas where your search experience is not returning the answers users expect. Then, you can easily fill the content gap by adding new information to your Knowledge Graph.

While 84% of users prefer to transact using some kind of search, 68% of people who have a bad experience with any search on your website won't return. That's why identifying and closing content gaps is critical to maintaining an effective AI Search experience.

How to use Search Term Clusters

It's important to spot trends in your search analytics, but the volume of data can be overwhelming. Your users can run hundreds, if not thousands, of searches every day and search term data has a unique human quality to it. Two customers with the same intent searching on the same search bar are likely to phrase their question in slightly different ways.

That's why Yext created Search Term Clusters, a tool that leverages the power of AI to make analyzing trends in search data scalable for our Answers customers. Search Term Clusters uses AI to automatically detect patterns in user intents so you can get a high level view of what users are searching for.

This machine learning tool identifies different search terms that share a common intent and groups them into one cluster. For example, a restaurant's relatively few searches for terms like "check my balance" and "Card balance" could individually be overlooked, but when grouped, it's clear that users want to know more about gift card balances.

To help you prioritize optimizations to your experience, Search Term Clusters are categorized by:

  • Size — Small Cluster vs. Large Cluster
  • Performance — Performing Well vs. Needs Attention

If searches on a topic have low engagement, its cluster would be highlighted as Needs Attention so you can add content that addresses the trending topic. Starting with Large Clusters helps you focus your efforts on the answers your users want most.

For example, if you have a Large Cluster for "covid vaccine appointment" that Needs Attention on your healthcare site search, you could meaningfully optimize your search experience simply by adding an FAQ pointing users towards your appointment scheduler.

Yext's custom AI clustering model

So how does this all work under the hood? Of course, to a human being, it's pretty clear that two searches like "locations near me" and "stores near me" are asking the same fundamental question, but this can be much harder to communicate to a computer because there is so much variability in how a user might phrase a search. It requires a large amount of training data and sophisticated algorithms.

The first step is to represent queries, or pieces of text, as points in space. While this seems abstract, it really means that we represent the query as a series of numbers, or "embeddings." This same concept powers Yext's semantic text search algorithm.

We turn queries into points in space using a neural network called BERT, or Bidirectional Encoder Representations from Transformers, a breakthrough open source machine learning framework for Natural Language Processing (NLP) introduced by Google in 2018 and later applied to its core search algorithm to better understand user queries. BERT transforms queries to embeddings so that, the more similar two queries are, the closer they'll be in space.

We can imagine this in two dimensions:

Terms like "outage map" and "outage near me" are more similar to one another — and thus closer together to each other on a plane — than they are to terms unrelated to outages like "tv lineup." While it is easiest to visualize the concept of embedding in two dimensions, in reality, Yext's embeddings include over 700 dimensions to represent each query.

At this point, we have each search term represented as a series of over 700 numbers, which may mean very little to a person, but mean a lot to a computer. The next step is for the machine learning model to look through the embeddings to find patterns in the numbers so we can cluster together similar terms.

We use an approach called DBSCAN, or Density-Based Spatial Clustering of Applications with Noise, for this. DBSCAN works by evaluating each point in space (in our case, each search term) and looking for nearby, and therefore similar, points. A group of points that are close to one another form a cluster.

Once we have our Search Term Clusters identified, we can begin to categorize them so that you can easily prioritize optimizations. If the searches within a cluster make up a large chunk of your total searches, it would be classified as a Large Cluster. Otherwise, it is a Small Cluster. Similarly, clusters with a high click-through rate are tagged as Looks Good, whereas those with a lower click-through rate are tagged as Needs Attention.

What else is new?

Checking out your Search Term Clusters is a great way to level-up your Answers experience by synthesizing key customer insights and closing content gaps. Take a look at our Hitchhikers module on clusters for more information on how to use the feature in your Yext account.

Plus, clustering just one of the many ways Yext is using AI to create smarter search experiences. Check out our Answers and Analytics pages to learn even more about Yext's algorithms and tools for finding insights in your search data.

All Blog Posts