Extract — Source, Source Settings & Selectors | Yext Hitchhikers Platform
What You’ll Learn
In this section, you will learn:
- An overview of the available Connector sources
- How to configure source settings to pull data from the correct endpoint
- How to specify selectors to pull in the relevant data components
Overview
The first step to create a connector is selecting your desired data source. Then you will select the entities you want to create or update with this connector.
Data Sources
There are many data sources available in the Connectors framework. These are divided into generic sources (data sources that do not involve third-party apps) and native sources (data sources that connect to third-party apps).
A complete list of the available sources can be found in the Connectors reference section, under Generic Sources and Native Sources. We’ll go over a few of the more common options here.
File Upload
The File Upload connector source allows you to upload Excel, CSV, and JSON files to Yext. Using this method allows you to clean and transform your data using the Connectors framework before importing it to Yext.
FTP/SFTP File Pickup
Similar to the file upload source, the FTP/SFTP source can be used to upload Excel, CSV, and JSON files to Yext. However, this source allows you to retrieve those files directly from where they are stored on your FTP/FTPS/SFTP server.
Crawler
The Crawler connector source scrapes web pages for their HTML content. Once a crawl has successfully run on a set of web pages, the Add Data flow can help convert that raw HTML content into entities in Yext Content.
This method is used to pull any relevant data from your website into Yext to either manage the content as entities, or to surface the data in Search experiences.
We will go into more details on how to set up the crawler in the next unit.
API
We support two API sources: Push to API and Pull from API.
Pull from API
This source allows you to pull data from any API and use the Connector to convert that data into entities without having to build and host an integration outside of Yext.
To do this, when you click Add Data you will select Pull from API and specify the details of the API such as the request URL, authentication method, and query parameters.
This method can be used to pull any relevant data from a system you can pull data from via an API. Specifically, to pull data from a public domain at a regular cadence.
Push to API
This option allows you to push data to an endpoint either via a regular API call or a Webhook message.
We will go into more details on how to leverage the API Connector Sources in the next unit.
Functions
The Function source allows you to write a fully custom Typescript function that can serve as the data source for your connector.
Generic Sources
Generic sources allow you to build a custom connector to retrieve data from a third-party platform (such as Adobe Commerce, Google Business Profile, and others).
Connectors using generic sources are slightly different from the apps in the App Directory. The App Directory contains pre-built connectors that are intended to be used out of the box. If you want a quick-start option, the App Directory may be the righr choice. However, if you want to do any customization of how the connector works with your third-party source, you may prefer to create a connector with a generic source.
The primary step you will take is to designate the operation you would like to perform, and where we should be pulling the dataset from (e.g., on Zendesk will have the option to ‘Fetch Help Articles’, and you will need to enter your Zendesk subdomain).
Then, you can use selectors to identify exactly which pieces of data you’d like to pull in to your entities in Yext Content, as well as apply any transforms to your data.
To leverage these methods, your data must already be stored in these third-party apps, which makes this a very easy way to pull that data in Yext.
Source Settings
Once you select your source you will need to configure your desired source settings. This will look different depending on the source you choose.
For Site Crawler you will need to select the Crawler you want to extract data from, and the desired site extraction settings.
If you choose Pull from API as your source, you will need to enter the GET Request URL, and desired Query Parameters from the API.
Specify Selectors
Once you have specified the details needed to pull the data in, you will then determine which specific pieces of data you want to pull in using selectors.
You can choose to Add Default Selectors and we will pull in all identified selectors. However, you also have the option to make any adjustments to those as needed.
For the Crawler, you can use built-in selectors to pull in data like page URL, or you can use CSS or XPath selectors to extract different elements from the site.
For Pull from API Connectors you will specify a JMES Path expression to extract a specific element from your API response.
That covers the ‘extract’ part of the process — the next unit will go into how to transform the data that you have extracted from the source before you load it into Yext.