If you’re already a Hitchhiker, log in to access this content.
Data Connector Best Practices| Hitchhikers Platform
What You’ll Learn
In this section, you will learn:
- When to use a Crawler
- When not to use a Crawler
You’ve learned a lot about Data Connectors at this point. If you’ve skipped all of the units until now, the tl;dr is that Data Connectors are designed to make building your Knowledge Graph easy and fast.
The Site Crawler is just one of the data available Connector sources, and there are some important things you should keep in mind as you navigate through your options.
The “best” way to integrate the data is also the most upfront work so you need to weigh the cost/benefit of that with your team when choosing your solution.
A Crawler might be a good option especially if you’re just getting started, creating a demo, or you’re pulling in long-form, unstructured content.
When to use a Crawler
The Crawler is perfect for pulling in unstructured long-form content like blog posts, help articles, or press releases that are being written and published elsewhere, but you want to surface in your Answers experience. This will save you from having to make the update in two places, yet make the content searchable programmatically.
For more structured entity types, like locations or products, the Crawler can still work as long as the pages being crawled are highly structured and don’t change frequently or unpredictably.
When not to use a Crawler
The Crawler might not be the best fit for you if the structure or the CSS of the pages being crawled change frequently and with no notice to you or your team. This could break the data mappings and make it difficult to update.
Also, if the entity type is highly structured, but your pages are not, you won’t be able to reliably create mappings from your webpage to your entity schema. For these, we recommend going straight to the source.
Another consideration is update latency. The Crawler is not going to pull in updates from your website in real-time so if it’s important that the entity type needs to be updated quickly, you should consider another method, like an App or an API integration.
One time imports can be powerful!
Don’t be afraid to use Data Connectors for a one-time import of your data. This is especially useful to give you a starting point, even if it means you need to clean up your data. For example, let’s say you want to pull in your Wealth Advisers to get started. But, the pages aren’t structured well and the source system you use doesn’t let you easily export so you’re having trouble generating a flat-file for upload. You can set up a crawler, sync the data to entities, and then export the content to excel to manipulate the content and clean it up a bit. Then you have your flat file and you can be ready to go!
Entity IDs are very important
We covered unique identifiers in the Ways to Update Data in Yext unit but the importance of the Entity ID cannot be understated!
Unique identifiers are how systems know whether to add, update, or delete records. We trigger updates to existing entities based on IDs. So, you want to make sure that when making adjustments to your configurations that you are using extra caution around the IDs, otherwise there can be unintended consequences of entities not being updated, or the wrong entities being updated.