I’m setting up a data connector from a crawler and I’m struggling to use the XPath selector to isolate a bit of text. After much trial with XPath selectors, I can’t seem to only get the text “wantThis” because the XPath selector always includes “dontWantThis1” and “dontWantThis2”. Is there anyway to only select “wantThis”?
The XPath selector I’m currently using is: //ul[@class=“ingredients-list”]/li/label/span[@class=“ingredient-product-wrap”]
<input type="checkbox" id="ingredient-60da21613d856" class="fa ingredient-checkbox">
<span class="ingredient-product-wrap" itemprop="recipeIngredient">
<span class="imperial">dontWantThis1 </span>
<span class="metric hidden">dontWantThis2 </span>
I actually figured this out, I used the code below.
This selects only the text in the parent and not the text in the children 's.
Awesome! Yes, that would definitely work! Impressive use of XPath.
You could also consider using a the “Direct Text” Extract Settings instead of the default “Text” setting.
The difference is that the “Text” setting will extract all text contained within the element you specify, including the text of any of that element’s children (e.g.
dontWantThis2 in addition to
wantThis), whereas the “Direct Text” setting will only extract the text directly within the element you specified and not its children (e.g. just
Hope that helps!
That is a great solution! Thanks so much for making me aware of this!