Detailed Page Crawl Failure Reasons (September '21 Release)

After a Crawl is complete, if the Crawler failed to crawl any pages, we will show an error message explaining why the crawl failed.

To view the failure reasons, you will navigate to the Crawler and click on the Pages tab. You will now see a column for Status Details. This will include error messaging for failed crawls.

For example, if the crawler was blocked from crawling a page, users will see ‘Crawler Blocked’ in Status Details. Users will then know to ensure that both the Crawler’s user agent and IP addresses are whitelisted.

Below is the list of potential Failure Reasons you may see:

  • Unknown Page Error — A transient error independent of the scraper system
  • Error Page — The page loaded successfully but was identified as an error page.
  • Page Time Out — the page timed out while loading and could not be crawled.
  • Crawler Blocked — the crawler was prevented from loading the page.
  • Unknown Crawler Error — a transient error within the scraper system
  • Page Size Limit — the page size exceeded the limit imposed by the crawler system

To learn more about the Crawler, visit the Yext Site Crawler & Crawler Connector training module.