Extract — Source, Source Settings & Selectors | Yext Hitchhikers Platform
What You’ll Learn
In this section, you will learn:
- An overview of the available Connector sources
- How to configure source settings to pull data from the correct endpoint
- How to specify selectors to pull in the relevant data components
The first step to create a Connector is selecting your desired Connector Source. Then you will select the entities you want to create or update with this Connector.
In this Data Connectors framework we have a variety of sources including:
- The Yext Site Crawler
- Third-party Sources like like Zendesk or Confluence
These data origins are all sources.
The Crawler is a tool that helps scrape web pages for their HTML content. Once a crawl has successfully run on a set of web pages, the Add Data flow can help convert that raw HTML content into entities in the Knowledge Graph.
We will go into more details on how to set up the Crawler in the next unit.
This method is used to pull any relevant data from your website into the Knowledge Graph to either manage the content as entities in the Knowledge Graph, or to surface the data in Search experiences.
We support two API Sources: Push to API and Pull from API.
Pull from API
This source allows you to pull data from any API and use the Connector to convert that data into entities without having to build and host an integration outside of Yext.
To do this, when you click Add Data you will select Pull from API and specify the details of the API such as the request URL, authentication method, and query parameters.
This method can be used to pull any relevant data from a system you can pull data from via an API. Specifically, to pull data from a public domain at a regular cadence.
Push to API
This option allows you to push data to an endpoint either via a regular API call or a Webhook message.
We will go into more details on how to leverage the API Connector Sources in the next unit.
We are also going to be adding more Data Connectors over time, and if there is a Data Connector that you’d like to see added — let us know in the Community!
This option allows you to write a fully custom Typescript function that can serve as the data source for your Connector.
To learn more about how to add a Function to your account and use it as a Connector data source, visit the Get Started with Functions guide .
Third Party Sources
You can build custom Connectors using these source options.
If you are looking to install a pre-built connector, you can do so by installing the App through the App Directory.
The primary step you will take is to designate the operation you would like to perform, and where we should be pulling the dataset from (e.g., on Zendesk will have the option to ‘Fetch Help Articles’, and you will need to enter your Zendesk subdomain).
Then, you can use selectors to identify exactly which pieces of data you’d like to pull in to your entities in the Knowledge Graph. As well as apply any transforms to your data.
To leverage these methods, your data must already be stored in these third-party apps, which makes this a very easy way to pull that data in Yext.
Once you select your source you will need to configure your desired source settings. This will look different depending on the source you choose.
For Site Crawler you will need to select the Crawler you want to extract data from, and the desired site extraction settings.
If you choose Pull from API as your source, you will need to enter the GET Request URL, and desired Query Parameters from the API.
Once you have specified the details needed to pull the data in, you will then determine which specific pieces of data you want to pull in.
You can choose to Add Default Selectors and we will pull in all identified selectors. However, you also have option to make any adjustments to those as needed.
For the Site Crawler you can use built-in selectors to pull in data like page URL, or you can use CSS or XPath selectors to extract different elements from the site.
For Pull API Connectors you will specify a JMES Path expression to extract a specific element from your API response.
That covers the ‘extract’ part of the process — the next unit will go into how to transform the data that you have extracted from the source before you load it into Yext.