Yext Crawler

The Yext Crawler helps you automatically populate your Knowledge Graph based on websites that you can crawl.

$idstring

The unique identifier for the Yext Crawler resource.

namestring Required

The display name of the crawler.

enabledboolean

Default: true

If true, the crawler will run according to its crawl schedule.

crawlScheduleenum (of string)

Default: “weekly”

Defines how often the crawler will index the website.

Must be one of:

  • “once”
  • “daily”
  • “weekly”

crawlStrategyenum (of string)

Default: “subPages”

Specifies the crawl strategy of the crawler

Must be one of:

  • “allPages”
  • “subPages”
  • “specificPages”

domainsarray of string

A list of domains or URLs to crawl, e.g. https://www.example.com

Must contain a minimum of 1 items

Each item of this array must be:

Type: string

ignoreQueryParameterOptionenum (of string)

Default: “none”

Option for ignoring query parameters when differentiating crawled URLs

Must be one of:

  • “none”
  • “all”
  • “specificParameters”

ignoreQueryParametersListarray of string

Any query parameters specified in the list will be ignored when differentiating crawled URLs.

Each item of this array must be:

Type: string

blacklistedUrlsarray of string

Any URLs that match any regex rule in the list will be omitted from the crawl.

Each item of this array must be:

Type: string

subPagesUrlStructuresarray of string

Specified wildcard URLs will also be considered when using the Sub Pages crawl strategy, e.g. www.yext.com/bad-website/blog/*

Each item of this array must be:

Type: string

headersarray of object

Custom header values that will be passed to each crawled page.

Each item of this array must be:

Type: object

string Required

string Required

fileTypes

Default: “allTypes”

Specifies which file types, if encountered, to crawl

Type: array of enum (of string)

Specifies which file types should be crawled

Must contain a minimum of 1 items

Each item of this array must be:

Type: enum (of string)

Must be one of:

  • “HTML”
  • “PDF”
Type: const

If selected, all supported file types will be crawled, including any added in the future

Specific value: “allTypes”

rateLimitinteger

Default: 100

Specifies the maximum number of concurrent crawls

Value must be greater or equal to 1 and lesser or equal to 15000

maxDepthinteger

Default: 10

Specifies the number of levels past your root URLs for the crawler to index

Value must be greater or equal to 0 and lesser or equal to 100