Step 2: Are There Connector Issues?
If the expected data was crawled correctly, then the issue is likely on the connector side. Follow the steps below to troubleshoot some common possible issues.
Missing URLs in Connector Settings
If the pages were crawled successfully, but not ingested by the connector, this is likely due to the connector settings.
Follow these steps to check whether the desired pages were extracted properly by the connector:
- On the Settings page for the connector, click View URLs. This will display URLs crawled successfully during the most recent successfully completed crawl. This will not include canceled crawls, or pages crawled in any previous crawl.
- If you are specifying any URL patterns, ensure that your wildcard notation is accurate.3
- Ensure that you are not unintentionally filtering out any URLs using a Filter Rows transform.
Missing Data on Ingested Entities
If you can see data in the HTML file, but the connector cannot extract it, it is likely due to the following reasons:
- Data outside the scope of the base selector is unreachable. If each URL is a list page, only data within the base selector container is reachable by the connector.
- Incorrect CSS or XPath notation when designating selectors
- Transforms incorrectly altering data
You may need to modify your base selector and selector paths in order to correctly reach the desired data. You can refer to the W3 Schools CSS Selector Reference for more guidance.
Missing Entities
This is generally caused by invalid field mappings, which cause the creation of new entities to fail.
To troubleshoot field mappings, refer to the entity_diagnostics.csv
and/or the etl_diagnostics.csv
file(s) in the downloadable results (found in the Activity Log for the connector).
Then, you can individually input the failed entity’s data into a field via the UI, or by sending it as an API request. The validation within the Knowledge Graph is the same whether it’s populated by the connector or by any other source.