Module Assessment | Yext Hitchhikers Platform
The Turtlehead Tacos team maintains a separate Help Site for people asking questions around online ordering or using the mobile app. They have a completely separate content management process for that so you don’t want to disrupt that, but you know that eventually you’ll want to make sure you can surface answers to these support questions in your Search experience. To prepare for that, you’ll set up a crawler for those help article pages so you can start to ingest the content. Later on, you can worry about adding this to your Search experience.
The help articles you’ll be working with are on help.turtleheadtacos.com.
Navigate to Content > Configuration > Crawlers
Click +New Crawler
Fill out the crawler settings:
- Name: “Help Articles”
- Schedule: “Once” – for now you’re just going to do a one-time import, but later you may update this to do it daily or weekly.
- Crawl Strategy: Sub-Pages – you only want to crawl this part of the site.
- Domain to Crawl: https://help.turtleheadtacos.com – make sure not to add a trailing “/” on the URL
Click “Save Crawler”. Congrats, you’ve created a Crawler!
It may take a few moments but you should see a Crawl in Progress. It should be complete once you see that 6 pages have been crawled. Wait until it completes (you might need to refresh the page). If you run into any errors, please reach out in the Community.
Now that you’ve set up the Crawler and extracted the html off the page, you can set up a Data Connector using the Crawler as a Source. Before you do that, we need to enable the Help Article entity type. Navigate to Content > Configuration > Entity Types and enable Help Article.
Navigate to Content > Connectors
Click +Add Connector
Select Site Crawler
Set your Crawler Extraction Settings and then click “Continue”
- Crawler: Help Articles (the Crawler you set up in Steps 2-4)
- URLs: Select Specific URLs or URL Patterns and enter
https://help.turtleheadtacos.com/*– this will pull all of the help articles but not the homepage
- Page Type: Detail page - each of the help articles is on its own page
Set Specific Selectors. Click “Add Default Selectors”. You’ll see the Page ID, Page URL and Page Title pulled. You’ll want to add a couple more selectors:
Add a selector with Header of “Body” and use the “Cleaned Body Content” as your specified Path. This will pull in the help body itself.
Add a selector using CSS Selector with Header of “Tags”. You want to pull in the tags at the bottom of the help articles like this . Inspect the page and/or use https://try.jsoup.org/ to try to find the right CSS Selector to pull the list in. To modify your selector, remember that you can hover over the Tags column and click on the pencil icon to edit the selector and try again. If you can’t figure it out, you can find the CSS Selector in .
Click Continue at the bottom of the page
Select Help Article as the Entity Type
Now it’s time to map the selectors to fields on your Help Article Entity Type. You’ll see the Selectors from the last step in the first Column with a preview of the data in the second column. Update the “Map to Yext Field” columns with the following and then click “Save”:
- Page ID → Entity ID
- Page URL → Landing Page URL
- Page Title → Name
- Body → Body
- Tags → Keywords (don’t check off the option – you want each tag to be in its own list item)
Click Save at the bottom of the page. You will be prompted to enter a Name and ID. Enter the following:
- Name: Help Articles
- ID: helpArticles
Click the “Save & Run Now” to pull the entities into your Graph!
Monitor your run to make sure it is successful. You should see 5 successful adds! Click Content > Entities to see your new “Help Article” entities.