Robots txt. file and URL query strings

Alyssa_Hubbard · April 24, 2020, 9:33pm

Hi Everyone,

My client reached out because they are noticing that their URL structure has a ? after it upon launching Yext Answers (e.g. on their Homepage it is: www.mydomain.com?)

I wanted to better understand how our robots.txt file works in terms of disallowing query strings. Is it an issue that their URLs now have this structure? Can you elaborate on how the robots.txt file works for not allowing our Answers production page from being crawled?

Thanks!
Alyssa

afarooque · April 27, 2020, 1:44pm

Hi Alyssa,

There are a few questions here, so I’ll answer them one-by-one!

How does the robots.txt file work?
The robots.txt file allows you to specify the user-agents that are allowed to crawl your site, the sitemap that lists all the URLs on your domain, and what URL patterns to avoid indexing and crawling.

For a client-hosted (iframe) implementation, your robots.txt file might look like this.

User-agent: *
Sitemap: https://domain.com/sitemap.xml
Disallow: /

The most important step here is the disallow statement. The / indicates that everything on this domain should not be indexed. We do this because we do not want the iFrame source URL to index organically; rather, we want the page that the client places the iFrame on to be indexed and ranked.

How are query parameters handled in indexing?
It is true that query parameters can be indexed in Google. However, in your case, you do not want these to rank independently in Google.

You can achieve this by setting the canonical URL of the page to the URL without any query parameters. For example, the canonical url of this page is below.

<link rel="canonical" href="https://hitchhikers.yext.com/community/t/robots-txt-file-and-url-query-strings/525">

Even if query parameters are added, this attribute tells search engines that this is the ultimate URL that should be indexed.

Hope this helps!

Additional Resources:

Alyssa_Hubbard · April 27, 2020, 9:42pm

That’s very helpful! Thank you Amani!

Topic		Replies	Views
Adding Robots.txt to Search Experience Best Practices - Knowledge Graph	0	2188	November 22, 2022
Https://winkels.hema.nl/hema-amsterdam-ndsm-straat-50b684e862bf	0	822	September 13, 2021
My RedirectURL already contains a query string: Search	9	1048	January 28, 2022
Struggling with Query Rules Search	2	896	September 5, 2023
Issues with 2 Search Engines crawling the same domains? Search	2	612	April 15, 2020

Robots txt. file and URL query strings

Related topics