Connector System Limits | Yext Hitchhikers Platform
Connectors
Object | Limits and Validations |
---|---|
Function invocation (for source and transform) | Timeout = 30 seconds |
Connector Display Name | 50 Character maximum |
Selectors | 2,000 Character maximum |
Column Headers | 100 Character maximum |
JSON Payload for Push to API Connector | 15MB request size |
Crawlers and Crawler Source Connector
Resource | Limits and Validations |
---|---|
PDF File Size | 50MB |
PDF Body File Size | 1MB of text from crawled PDF can be ingested by Connector as body field. If limit is exceeded, we will truncate the data to only be the first 1MB of text. |
Maximum Depth | 0 - 100 levels past the root URL the Crawler will spider to |
Rate Limit | 1 - 15,000 concurrent tasks |
Number of URLs the Crawler will spider through | 100,000 maximum. Once this limit is hit, the Crawler will stop. |
Domain | 768 Characters maximum |
Crawler Name | 255 Character maximum |
URL Pattern | 1-2,000 Characters |
Page Load Wait Time | The Crawler will wait 10s to load the page (and execute JS) before extracting the HTML |
Unused Crawler Specifications
All Crawlers that have not been used in the last 14 days will have their Crawl schedules automatically changed to “Once” after the 14th day of inactivity.
A crawler is considered unused or inactive if:
- The crawler is not linked to a Connector
Or, if the crawler is linked to a Manually Run Connector and:
- The linked Connector has not been run in the last 14 days
OR
- The Crawler configuration has not been viewed in the last 14 days
Crawlers that have been set to “Once” after being deemed inactive, will remain in the platform, and can be viewed and run manually at any point. A schedule can also be re-added to the crawler, but should the crawler be considered unused after another 14 days, the schedule will again be removed.
Feedback
<% elem.innerText %>