How to set domain parameters to exclude URLs within Crawler

Sean_Coleman · May 3, 2021, 2:43pm

Hi team! I am working with a client using the Crawler where we are noticing multiple urls are getting pulled in for a single article because the urls have appended corresponding identifiers which doesn’t affect the content. Therefore, we have many duplicates I want to remove.

My thinking is to set a parameter which, moving forward, would not crawl a URL if there is content after a particular /* - how exactly would I do that? Essentially, I want to exclude all URLs with the identifiers appended at the end.

Thanks in advance for the help!

Topic		Replies	Views
How to correctly use Blacklisted URLs on Crawler? Content	4	1002	March 10, 2022
Ignore Query Parameters in Crawler (Summer '21 Release) Summer '21 Release	0	1330	July 24, 2021
Setting Up Crawler for PDFs Content	6	2109	February 22, 2023
Crawl pdf data each specific page Content spring21-release	1	1292	July 5, 2022
Robots txt. file and URL query strings Search	2	1250	April 27, 2020

How to set domain parameters to exclude URLs within Crawler

Related topics