aleph: Highlighting of search terms does not work on Documents
I have set ALEPH_RESULT_HIGHLIGHT=true
in aleph.env
and highlighting of the search term seems to work on entity types other than Documents. I.e. searching for a term that results in hits in CourtCase entities will display a highlight of the search term if the search term is in i.e. the Summary field of CourtCase.
For Documents highlighting works if the search term is in the title attribute of the Document entity (or other attributes), but not if the search term is in the indexed text of the document.
I can see from the text tab in the detail view of the document that the text has been indexed correctly, but still no highlight is displayed.
A search result limited to CourtCase
entities returns highlight attribute. Limiting to Document
(Pages) entities will not return the highlight attribute on the results.
The strange thing is that highlighting seems to work if the entity is of type Image
which has been OCRed into text. Then the search term (if found in OCRed text) will appear as highlighted.
If I do a search within the document when viewing the Document entity itself (the scope is only that Document), then highlighting works and the Pages returned contain the highlight
attribute
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 15 (10 by maintainers)
The search matches on a
text
field that is excluded from search results by default. Also, this text field is not part of theproperties.*
field that is used in the entities highlight query, so won’t be included in the highlight.After including this field in results & highlights, highlight results are weird, with elastic highlighting wrong parts of the text, which seems to be caused by the
term_vector
configuration, when i comment this line, i get proper search highlights.I’ve asked @pudo for a bit of explanation before i start tackling this, as i’d love to know a bit more about the reasoning behind excluding these text fields etc.
Hmm, after looking at this more closely, i think we’re actually discussing 2 separate things here.
Pages
entities, which is an issue.Would it be an idea to open a separate issue to track point 2, the page context in search?
Also, when i’ve got some time next week, i’ll see if i can investigate the missing highlights.
Should be fixed with https://github.com/alephdata/aleph/pull/2416
hi @anderser , @sunu how are you? ^^ @monneyboi said I should come to the party 😃
I think knowing the page number is relevant because you might want to search for more context on the pages - the context might be a lot larger than the snippet. Or you can see a search term concentrated in a certain part of the document. It is valuable info imho