Bot Traffic in Search | Yext Hitchhikers Platform

What is bot traffic?

Bot traffic refers to any traffic on your website that is not conducted by a human being. It is estimated that 40% of all traffic on the web today is conducted by bots .

Not all bots are necessarily bad or malicious. Bots can include legitimate crawlers for search engines and digital assistants like Google or Siri, who crawl websites to populate their search engines. These bots are usually very distinctly marked and identified – for example, Google publishes the user agents of all of their bots , which makes it easy to identify traffic from these bots and exclude them from reporting, if desired.

On the other hand, some bots can be malicious – for example, those used for content and data scraping , credential stuffing , or any number of other bot attacks. These bots often “spoof” their user agents; in other words, they set their user agents to mirror user agents of popular web browsers like Chrome or Safari, so they are harder to detect.

Being able to accurately identify bot traffic is important both to be able to prevent such bot attacks, and in general to be able to understand what traffic on your site is being done by a human user. For Yext, this includes both being able to accurately provide analytics for your search experience (how many searches were run, how many results were clicked, and so on) – as well as to only count searches conducted by a human against search capacity.

How does Yext identify bot searches?

Yext uses Cloudflare Bot Management to identify bot traffic for all Search experiences.

Cloudflare is one of the biggest Content Delivery Networks (CDNs) in the world, and as such, they can collect data from the billions of traffic requests happening on their network every day. With all this data, Cloudflare has been able to develop machine learning techniques to identify likely bot traffic based on common markers like user agent or IP address, or on more advanced behavioral techniques.

Yext uses Cloudflare as its CDN for Search, so all Search requests are already routed through Cloudflare’s network. So whenever a Search request is placed on Yext, in addition to fulfilling the request, Cloudflare also provides Yext with a prediction of whether or not that search was likely placed by a bot.

You can read more about Cloudflare Bot Management here .

Finally, in addition to Cloudflare Bot Management, Yext maintains its own blacklist of IP addresses and ASNs which are known sources of bot traffic. So, if any searches are placed from one of these sources, and is not caught by Cloudflare, it is still excluded.

Note: Cloudflare Bot Management also requires a cookie (__cm_bf), which should be considered strictly necessary, and hence should not require user consent. You can read more about the __cm_bf cookie here .

Does Yext count bot searches towards my search capacity?

No, any searches that are determined to have been placed from a likely bot source are automatically excluded from search capacity.

Also if Cloudflare indicates that a query is likely bot traffic, Search will not return third party verticals (e.g., GCSE).

Does bot traffic affect my search analytics?

Bot searches and any user events associated with them are automatically excluded from analytics screens in the Searches tab like Search Terms, Search Term Clusters, and the Search Log.

However, bot traffic can still be viewed in the platform in a custom report, using Report Builder. You can use the dimension / filter “Is Human” to delineate between bot and human traffic.

What limitations are there on Yext bot identification?

Yext bot identification can only be used on Search requests made from a consumer browser, such as Chrome or Firefox. It does not work with server-side integrations with the Search API, such as making search requests through a reverse proxy server. Additionally, bot identification is not guaranteed to be 100% accurate. It is unlikely but always possible for traffic to be incorrectly classified.

Feedback