File Search | Yext Hitchhikers Platform

Search allows users to search directly on file fields. This includes extracting text content from files and searching on that content as if it were a regular multi-line text field.

If you want to return the extracted contents from the file in the API response, you can do that with display fields . If you add c_file.s_content to display fields, we will return the contents of the whole file in the response. This may impact the performance of the API, if file contents are very large.

Below is a list of supported file types and limitations of file search.

Supported File Types

  • doc
  • docx
  • html
  • odt
  • md
  • markdown
  • pdf
  • ppt
  • pptx
  • rtf
  • txt
  • xls
  • xlsx

Limitations

  • Files must contain raw text. For example, a PDF of a scanned document (which is just an image) will not return any text. A good test is whether or not you can highlight the text in the file.
  • A maximum of 100kb of text can be extracted from files per entity. That is roughly 50 pages single spaced (note that it varies by font and size). Any text beyond 100kb will be truncated.
  • File Search does not support substring match highlighting or navigating to the page/section in the document where the text match occurred.
Feedback