Step 4: Sitemap Source Type Settings
If you’ve chosen a Sitemap as your source type, enter your sitemap URL and choose whether you want to reference the lastmod
tag on your sitemap when crawling pages.
Which Pages or Domains Would You Like to Crawl?
Enter a sitemap URL. To get your sitemap URL, refer to the documentation for your content management system, or ask your web team. Below are instructions for some content management systems:
You’ll also choose whether the crawler should look at the lastmod
tag on URLs in your sitemap when performing a crawl. Selecting this option means that only URLs that have been modified since the last crawl took place will be crawled again.
Which URLs Should Be Omitted from the Crawl?
This is optional: specify any URLs you want to blacklist.
Exclude specific URLs from a crawl, even if they match your chosen crawl strategy and other settings. Enter each URL to blacklist on a separate line. You can also use wildcard notation here.
Advanced Crawler Settings
The rate limit determines how many tasks the crawler can execute on a site at one time, without impacting site performance.
By default, this is set to 100. You may need to consult your web team in order to determine an ideal rate limit for your site.
Save Crawler
Once you’ve configured your crawler settings, click Save Crawler at the bottom of the screen.