Solr highlighting gives field/snippets with ANY te

2019-09-08 01:07发布

I'm using Solr 5.x, standard highlighter, and i'm getting snippets which matches even one of the search terms only, even if i indicate q.op=AND. I need ONLY the fields and snippets that matches ALL the terms (unless i say q.op=OR or just omit it), i.e. the field/snippet must satisfy the query. Solr does return the field/snippet that has all the terms, but also return many others.

I'm using hl.fl=*, to get the only fields having the terms, and searching against the default field ('text' containing full doc). Need to use * since i have multiple dynamic fields. Most fields are 'text_general' type (for search and HL), and some are 'string' type for faceting.

If its not possible for snippets to have all the terms, i MUST get only the fields that satisfy the query fully (since the question is more talking about matching all the terms, but the search query can become arbitrarily complex, so the fields/snippets should match the query).

Also, next is to get snippets highlighted with proximity based search/terms. What should i do/use for this? The fields coming in highlighting in this scenario should also satisfy the proximity query (unlike i get a field that contain any term, without regard to proximity constrains and other query terms etc)

Thanks for your help.

1条回答
我只想做你的唯一
2楼-- · 2019-09-08 01:16

I've also encountered the same problem with highlighting. In my case, the query like

(foo AND bar) OR eggs

highlighted eggs and foo despite bar was not present in the document. I didn't manage to come up with proper solution, however I devised a dirty workaround.

I use the following query:

id:highlighted_document_id AND text:(my_original_query)

with debugQuery set to true. Then I parse explain text for highlighted_document_id. The text contains the terms from the query, which have contributed to the score. The terms, which should not be highlighted, are not present in the explanation.

The Python regex expressions I use to extract the terms (valid for Solr 5.2.1):

term_regex = re.compile(r'weight\(text:(.+) in') wildcard_term_regex = re.compile(r'text:(.+), product')

then I simply search the markings in the highlighted text and remove them if the term doesn't match against any of the term in term_regex and wildcard_term_regex.

The solution is probably pretty limited, but works for me.

查看更多
登录 后发表回答