Module Assessment | Yext Hitchhikers Platform
Background
The Turtlehead Tacos team maintains a separate Help Site for people asking questions around online ordering or using the mobile app. They have a completely separate content management process for that so you don’t want to disrupt that, but you know that eventually you’ll want to make sure you can surface answers to these support questions in your Search experience. To prepare for that, you’ll set up a crawler for those help article pages so you can start to ingest the content. Later on, you can worry about adding this to your Search experience.
The help articles you’ll be working with are on help.turtleheadtacos.com.
Your Challenge
Navigate to Content > Configuration > Crawlers.
Click + New Crawler.
Fill out the crawler settings:
- Name: “Help Articles”
- Schedule: “Once” – for now you’re just going to do a one-time import, but later you may update this to do it daily or weekly.
- Crawl Strategy: Sub-Pages – you only want to crawl this part of the site.
- Domain to Crawl: https://help.turtleheadtacos.com – make sure not to add a trailing “/” on the URL
Click “Save Crawler”. Congrats, you’ve created a Crawler!
It may take a few moments but you should see a Crawl in Progress. It should be complete once you see that 6 pages have been crawled. Wait until it completes (you might need to refresh the page).
Now that you’ve set up the Crawler and extracted the HTML off the page, you can set up a Data Connector using the Crawler as a Source. Before you do that, we need to enable the Help Article entity type. Navigate to Content > Configuration > Entity Types and enable Help Article.
Navigate to Content > Connectors.
Click + Add a Connector.
Select Site Crawler.
Set your Crawler Extraction Settings and then click “Continue”.
- Crawler: Help Articles (the Crawler you set up in Steps 2-4)
- URLs: Select Specific URLs or URL Patterns and enter
https://help.turtleheadtacos.com/*
– this will pull all of the help articles but not the homepage
Select the Page Type as Detail page - each of the help articles is on its own page.
Click “Add Default Selectors”. You’ll see the Page ID, Page URL and Page Title pulled. Click Add Selector at the top to add a couple more selectors and then click “Continue”.
Add a selector with Header of “Body” and use the “Cleaned Body Content” as your specified Path. This will pull in the help body itself.
Add a selector using CSS Selector with Header of “Tags”. You want to pull in the tags at the bottom of the help articles like this one . Inspect the page and/or use https://try.jsoup.org/ to try to find the right CSS Selector to pull the list in. To modify your selector, remember that you can hover over the Tags column and click on the pencil icon to edit the selector and try again. If you can’t figure it out, you can find the CSS Selector in this gist .
Select Help Article as the Entity Type.
Now it’s time to map the selectors to fields on your Help Article Entity Type. You’ll see the Selectors from the last step in the first Column with a preview of the data in the second column. Update the “Map to Field” column with the following and then click “Save”.
- Page ID -> Entity ID
- Page URL -> Landing Page URL
- Page Title -> Name
- Body -> Body with subfield Markdown
- Tags -> Keywords mapped to an entire list with “,” as the Delimiter
Click Save at the bottom of the page. You will be prompted to enter a Name and ID. Enter the following:
- Name: Help Articles
- ID: helpArticles
Click the “Save & Run Now” to pull the entities into your account. Run in Default Mode.
Monitor your run to make sure it is successful. You should see 5 successful adds! Click Content > Entities to see your new “Help Article” entities.