Connector System Limits | Yext Hitchhikers Platform

Connectors

Object Limits and Validations
Function invocation (for source and transform) Timeout = 30 seconds
Connector Display Name 50 Character maximum
Selectors 2,000 Character maximum
Column Headers 100 Character maximum
JSON Payload for Push to API Connector 15MB request size


Crawlers and Crawler Source Connector

Resource Limits and Validations
PDF File Size 50MB
PDF Body File Size 1MB of text from crawled PDF can be ingested by Connector as body field. If limit is exceeded, we will truncate the data to only be the first 1MB of text.
Maximum Depth 0 - 100 levels past the root URL the Crawler will spider to
Rate Limit 1 - 15,000 concurrent tasks
Number of URLs the Crawler will spider through 100,000 maximum. Once this limit is hit, the Crawler will stop.
Domain 768 Characters maximum
Crawler Name 255 Character maximum
URL Pattern 1-2,000 Characters
Page Load Wait Time The Crawler will wait 10s to load the page (and execute JS) before extracting the HTML


Unused Crawler Specifications

All Crawlers that have not been used in the last 14 days will have their Crawl schedules automatically changed to “Once” after the 14th day of inactivity.

A crawler is considered unused or inactive if:

  • The crawler is not linked to a Connector

Or, if the crawler is linked to a Manually Run Connector and:

  • The linked Connector has not been run in the last 14 days

OR

  • The Crawler configuration has not been viewed in the last 14 days

Crawlers that have been set to “Once” after being deemed inactive, will remain in the platform, and can be viewed and run manually at any point. A schedule can also be re-added to the crawler, but should the crawler be considered unused after another 14 days, the schedule will again be removed.