Yext Crawler
The Yext Crawler helps you automatically populate your Knowledge Graph based on websites that you can crawl.
$idstring
The unique identifier for the Yext Crawler resource.
$schemaconst
“https://schema.yext.com/config/crawler/site-crawler/v1"
namestring Required
The display name of the crawler.
enabledboolean
If true, the crawler will run according to its crawl schedule.
crawlScheduleenum (of string)
Defines how often the crawler will index the website.
Must be one of:
- “once”
- “daily”
- “weekly”
crawlStrategyenum (of string)
Specifies the crawl strategy of the crawler
Must be one of:
- “allPages”
- “subPages”
- “specificPages”
domainsarray of string
A list of domains or URLs to crawl, e.g. https://www.example.com
Must contain a minimum of 1
items
Each item of this array must be:
ignoreQueryParameterOptionenum (of string)
Option for ignoring query parameters when differentiating crawled URLs
Must be one of:
- “none”
- “all”
- “specificParameters”
ignoreQueryParametersListarray of string
Any query parameters specified in the list will be ignored when differentiating crawled URLs.
Each item of this array must be:
blacklistedUrlsarray of string
Any URLs that match any regex rule in the list will be omitted from the crawl.
Each item of this array must be:
subPagesUrlStructuresarray of string
Specified wildcard URLs will also be considered when using the Sub Pages crawl strategy, e.g. www.yext.com/bad-website/blog/*
Each item of this array must be:
headersarray of object
Custom header values that will be passed to each crawled page.
Each item of this array must be:
string Required
string Required
fileTypes
Specifies which file types, if encountered, to crawl
Specifies which file types should be crawled
Must contain a minimum of 1
items
Each item of this array must be:
Must be one of:
- “HTML”
- “PDF”
If selected, all supported file types will be crawled, including any added in the future
Specific value:“allTypes”
rateLimitinteger
Specifies the maximum number of concurrent crawls
Value must be greater or equal to 1
and lesser or equal to 15000
maxDepthinteger
Specifies the number of levels past your root URLs for the crawler to index
Value must be greater or equal to 0
and lesser or equal to 100