Web Crawler - Random important elements in large element

Hello, I have two questions.
I hope you can help me.

  1. with the web crawler selectors like :first-child seem not to work properly, what could be the reason for this?

  2. for hotels there is an info page for this application, which comes through an external service, so a web crawler must be used e.g. https://www.chiemsee-alpenland.de/chiemsee/ukv/house/Bad-Feilnbach-Pension-Gaestehaus-Huber-DEU00000060002247028.

Here you can see a category “Equipment & Information”.
This contains several relevant data, which I would like to use as a separate attribute for the graph. However, the data, which is in random order, does not have its own selector. Is the NLP good enough to use all information as one attribute. The language is german.

With kind regards
Simon

Hi Simon,

Yes, I see that with the way this page is set up, it is a bit difficult to separate the different headers under the “Equipment & Information” section into different fields since the headers all use the same selector.

If I understand correctly, as an example, you would want a setup like below:

  • Meals field with a text list containing “Shopping service before arrival” and “breakfast”
  • Breakfast field with a text list containing “Bread service”, “breakfast buffet”, and “Regional specialties”

However, using the selector .tp-characteristics__text as you have done in your account would pull in all the attributes together in one long string, i.e. “Shopping service before arrival, breakfast, Bread service, breakfast buffet, Regional specialties”. You could use a “Split to Column” transform to separate these into separate columns (and thus fields) with a comma as the separator. However, each attribute would be mapped to a separate field. Without some kind of separator in between the “meals” and “breakfast” categories, you won’t be able to group them automatically.

The other option is to have all attributes under the same field, which I think is what your question about NLP is referring to. If you expect people to search your Answers experience for something like “hotels with breakfast”, you can add a text search to this field. I’d caution against using an NLP filter since there are similar attributes such as “breakfast” and “breakfast buffet” - an NLP filter would narrow down to the one best match and create a black and white filter from there. Check out the Searchable Fields Best Practices unit to learn more.

You could also add a searchable facet to this attributes field (with or without making using text search as well) so that users can check off the features they want. See an example with the Publishers vertical in the Hitchhikers search.

If you have another idea to accomplish what you want, but the crawler product can’t support it, feel free to submit a product request to the Ideas board. We appreciate any feedback!

1 Like