Lucene phrase query with terms in OR

2019-05-28 10:40发布

Suppose that i have 5 documents having the field text as follow:

  1. the red house is beautiful
  2. the house is little
  3. the red fish
  4. the red and yellow house is big

What kind of query should i use to retrieve the documents such that the rank is the following if i search for "red house":

  1. the red house is beautiful and big [matching: red house]
  2. the red and yellow house is big [matching: red x x house]
  3. the house is little [matching: house]
  4. the red fish [matching: red]

What i need is to give an high rank to the documents that match the phrase i've searched, and a lower score to the documents that have just a part of the phrase searched. Notice that the string query could contains also more than 2 terms.

It is like a PhraseQuery in which each term can appear or not, and in which the closer are the terms the higher is the score.

I've tried to use compose a PhraseQuery with a TermQuery but the result is not what i need.

How can i do?

Thanks

标签: search lucene
2条回答
你好瞎i
2楼-- · 2019-05-28 11:30

Try creating a BooleanQuery composed of TermQuery objects, combined with OR (BooleanClause.Occur.SHOULD). This will match documents where only one term appears, but should give a higher score to those where both appear.

Query term1 = new TermQuery(new Term("text", "red"));
Query term2 = new TermQuery(new Term("text", "house"));
BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.add(term1, BooleanClause.Occur.SHOULD);
booleanQuery.add(term2, BooleanClause.Occur.SHOULD);
查看更多
SAY GOODBYE
3楼-- · 2019-05-28 11:37

I think a PhraseQuery with a postive setSlope, SHOULD-combined with a TermQuery for every term, should get you there. Maybe with a boost for the PhraseQuery.

I've tried to use compose a PhraseQuery with a TermQuery but the result is not what i need.

What do you get with this combination and how it is not what you need?

查看更多
登录 后发表回答