Step 1: Was the Expected Data Crawled?

First, check whether the crawler triggered a connector run as expected. Then, check if the missing data was available via the crawler for the connector to pull from.

If there are pages missing, or if crawled pages have missing or incorrect data, this is likely a Crawler issue. Follow the steps below to continue troubleshooting.

Connector Trigger

Check whether the crawl you are troubleshooting actually triggered a connector run. Only completed crawls with at least one page with an update detected will trigger a new connector run.

For more on connector triggers, see the Connector Triggers reference.

Missing Pages

If the page was not crawled at all, then the issue is likely with the configuration of the crawler.

Check this by searching for the URL of the missing page in the Details page of the most recent crawl: if the URL is not present on the Details page, the page was not crawled.

Follow these steps to troubleshoot:

  1. Check the configuration of the crawler. Confirm that the chosen crawl strategy would have reached the intended page.
  2. In the crawl results, find a page that was crawled successfully and contains a link to the missing page.
  3. Confirm that the HTML of the crawled page actually contains the correct URL of the missing page in an href tag.

Missing or Incorrect Data

If a page was crawled and an entity was created, but some data is missing or incorrect, you’ll need to determine whether the issue came from the crawler or the connector.

Check the crawled page data

For the crawled page with missing or incorrect data, find the PageID extracted by the connector.

  1. Go to the Details page for the most recent crawl. Search for the incorrectly crawled page using the Page ID.
  2. Click Download HTML on the incorrectly crawled page. This is the extracted HTML that the connector ingested.

If the HTML of the crawled page is as expected, then the issue is with the connector, not the crawler. Skip to the next step of this guide to continue to troubleshoot.

Recrawl the page

If the HTML ingested by the connector is not the same as what appears for that page on your site, try to recrawl the page.

If the recrawl is not successful, it could be due to one of these issues:

  • JavaScript taking over 10 seconds to load: the Crawler would not pick up the HTML
  • Crawled site error: any one-off issues or outages on the crawled site that affected page load

If the recrawl is not successful, and it is not due to JavaScript load time or an issue on the crawled site, contact Yext Support to investigate further.