Introduction | Yext Hitchhikers Platform

Overview

The Yext site crawler scrapes webpages to extract data, which you can then use to create and update entities in the Knowledge Graph with a connector.

The crawler is an ideal data source for these use cases:

  • Pulling data from unstructured, long-form content (e.g., blog posts, help articles, press releases)
  • Creating entities that do not require highly structured data (e.g., FAQs, posts, articles)

In this guide, you will:

  • Whitelist the necessary IP addresses to allow your site to be crawled
  • Configure your settings for the pages to be crawled, crawler frequency, and other specifications
  • Create a connector to create or update entities using the crawled site data
book
Note
The Yext site crawler should only be used on websites and domains that are owned and operated by your organization. You should not use the crawler on any third-party websites or domains.

More Resources

For more information on the site crawler tool, see the Crawler Product reference.

For more on using the crawler as a connector source to pull data into the Knowledge Graph, see the Crawler Source reference.

For general troubleshooting steps and FAQs in Connectors, see the Connectors FAQ and Troubleshooting guide.