Administrators now have the ability to view the raw HTML of crawled pages, so they can see exactly which data was crawled. This can help understand if the crawler successfully accessed the content they are looking for.
To view the HTML of a crawled page, navigate to the Crawler and click on the Pages tab. You will now see a View HTML button next to each Page URL where you can view the HTML of the crawled page.
When you click on View HTML an external page will open that displays the raw HTML of the crawled page. It will look something like this:
Viewing the raw HTML we were able to crawl is one of the primary methods of debugging the crawler, as this can help to identify if an error is happening with the webpage, the crawler, or connectors.
Now, with the ability to view the raw crawled HTML for a page, users have the ability to debug any issues encountered during crawls.
To learn more about the Crawler, visit the Yext Site Crawler & Crawler Connector training module.