If you’re already a Hitchhiker, log in to access this content.
Search Term Clustering| Hitchhikers Platform
What You’ll Learn
In this section, you will learn:
- How we can group similar terms into ‘clusters’ for analysis
- Where to find Search Term Clusters in the platform
- Use cases for reviewing and using clusters
What are Clusters?
Search Term Clustering automatically groups together Search Terms with similar meaning to make analyzing and understanding the questions consumers are asking on an Answers Experience easier. Imagine our Turtlehead Tacos site search has the following search terms in a certain period:
- how do i check my gift card balance (30 sessions)
- gift card balance (20 sessions)
- card balance (15 sessions)
- check balance (15 sessions)
- account balance (10 sessions)
- find balance (5 sessions)
- egift card balance (5 sessions)
Individually, these search terms may not meet your threshold for review, but add them all up together, we have 100 sessions curious about checking their gift card balance! Clearly it’s something Turtlehead Tacos users want to learn about, and it’s important that we have a good answer no matter how users are searching for this information.
Search Term Clusters go hand in hand with Search Term Labels, which you learned about in the Reviewing and Scoring Search Terms unit, that allows users to manually group Search Terms.
How does Clustering Work?
To develop the clusters, we roughly take the following steps:
Understand the Intent of the Search Term using BERT
The first step in cluster analysis is deriving the intent from each search term. Two search terms might share the same word (maybe “check”) but have two very different intents (“check my account balance” vs. “how do I order checks”). Using BERT, we can leverage natural language understanding to determine the actual meaning of each token (word) in a search term in context. Each of the search terms is analyzed to derive vector representations of that term, which is just a fancy way of saying these BERT embeddings can create a numeric or mathematical basis for comparison. In the example above we can understand the word “check” in “check my account balance” means “to look” while the world “checks” in “how do I order checks” means the “banking product”.
You can learn more about BERT here.
Determine the “Closeness” of Groups of Search Terms
Once we determine the vector representations of all search terms for an experience, we can then use common data science clustering algorithms to calculate which terms are close enough (in vector space) to constitute a “cluster”. We can also determine clusters that might have a lot of excess noise, or might have too many individual search terms to be meaningful and mark those as Unclustered.
Once a cluster has been identified, we’ll then provide it a name based on the most popular search term in that cluster (determined by # of sessions) i.e. “How do I check my Gift Card balance?” for our example cluster above.
This process runs on a weekly basis, taking into account the previous 120 days of search data.
Viewing Clusters in the Platform
Search Term Clusters Screen
Search Term Clusters can be found on the Answers > Experiences > Search Term Clusters screen. Here you’ll see a table of Search Term Clusters that each represent different ways users can ask the same fundamental questions. Expanding a cluster shows the relevant search terms that were grouped together. The metrics show how important this question is to users and how the clusters are performing in search.
Search Term Clusters Screen with Cluster Performance
In the Summer ‘21 Release we’re introducing Cluster Performance, which will make it easier for admins to optimize their search experiences at scale. Search term clusters will automatically be categorized into four groups based on size and quality:
- Needs Attention - Large Cluster
- Needs Attention - Small Cluster
- Performing Well - Small Cluster
- Performing Well - Large Cluster
Here’s how we define Size and Quality:
- Size - A cluster is considered Large if it represents more than 10% of total searches.
- Quality -A cluster is considered Performing Well if it has a click-through rate greater than or equal to 30%.
This makes it easy to see what your users are searching for and understand how those clusters are performing so you can focus your attention on the clusters you need to improve. Click on a hero number to filter down to that category. Filter to the Needs Attention categories to quickly determine your priorities for optimization.
Filtering by Clusters
A good way to scan and isolate the most popular clusters is by using the filters on the Search Terms screen.
In the Clusters filter, you’ll see a list of clusters derived from the past 120 days of data, as of the last time the cluster analysis was run. These are sorted in descending order of the number of Search Terms in the cluster.
When you export search terms from the Search Terms screen, you’ll see a column identifying the cluster the search term belongs to. If there is no cluster assigned, it’ll appear as “Unclustered”.
Clusters in Analytics
Search Term Clusters can be used in Report Builder as a dimension and filter. Check out the Answers Analytics in Report Builder unit to learn more!