Crawler - Select specific address components within one span

I’m trying to pull provider information for a new entity into the KG from a website via the crawler + data connector. I’ve set up the majority of the data connectors I need, but I’m struggling to get the address data set up correctly. When I use a CSS or XPath selector, the address data gets pulled in as “City State Zip” in one text string, and the data connector can’t parse this into the respective fields in the entity. How do I pull in just a single set of text (one line within a span) with CSS or XPath?

URL with the address info I’m trying to pull:
Address line I’m working on: Duarte, CA 91010

CSS selector I’ve tried:

tab0-0 > div > div.bio_location.col > div > div > div.loc-item-content > div.loc-item-address > span:nth-child(5)

XPath selector I’ve tried (isolated to just “Duarte”):


Another detail worth noting: the address field is required in order to add this entity type (HC professional) to this KG. So without this connector set up correctly, I can’t use the crawler + data connector flow for the use case I’m trying to solve for.

Thank you!

Hi @Laura_Canale ,

as a quick workaround you might consider setting up a custom entity type (to overcome the required field hurdle), use a Data Connector to load the the data fetched by the Crawler into entities of this type (with the address details all in one field), then export and use regular expressions to split the data as needed to create the field structure you need for your HC Professional entities.

I don’t think there is a way to meet you requirements with a Data Connector directly at this time given the somewhat limited choice of selectors currently supported.


Hi @Stefan_Heidbrink - thank you for the quick and helpful response. I will set up a custom entity type as a workaround. I appreciate the input / perspective. And fingers crossed that the selector options evolve and expand from where they are today!

Cheers and happy hitchhiking,

1 Like