How can Lucene's scoring depend on relative po

2019-08-08 18:05发布

I use WhitespaceAnalyzer as query analyzer. If I have 2 documents:

| text | a b c |
| text | b a c |

text is a field.

Now the index structure is something like this:

|Term|  in document | 
| a  | a b c / b a c|
| b  | a b c / b a c|
| c  | a b c / b a c|

And I have a query:

| text | a b c |

How can I get a higher score for a b c and a lower one for b a c.

Does Lucene support calculating score depending on relative position?

I found that I found this would help:

PhraseQuery phraseQuery = new PhraseQuery();
phraseQuery.setSlop(1);

In this way they would get different scores.

See more: http://www.blogjava.net/tangzurui/archive/2008/09/22/230357.html

And here I come across another question: https://stackoverflow.com/questions/18394532/how-can-lucenes-scoring-depend-on-terms-relative-position-in-the-document

2条回答
家丑人穷心不美
2楼-- · 2019-08-08 18:30

It depends on, which type of query you use. Some query could get more score, if phrase that you search is placed in correct order (e.g. new york or york new). According to Lucene documentation, you could use explanation of score, to see, why A B C is geting higher score than B A C.

Scoring is very much dependent on the way documents are indexed, so it is important to understand indexing (see Apache Lucene - Getting Started Guide and the Lucene file formats before continuing on with this section.) It is also assumed that readers know how to use the Searcher.explain(Query query, int doc) functionality, which can go a long way in informing why a score is returned.

http://lucene.apache.org/core/3_6_2/scoring.html

UPD. For storing position of terms look at this, if you using Lucene 3 http://lucene.apache.org/core/3_0_3/api/core/org/apache/lucene/document/Field.TermVector.html

查看更多
小情绪 Triste *
3楼-- · 2019-08-08 18:31

The score contribution of a phrase match depends on the distance:

  • highest score for distance=0 (exact match).
  • score gets lower as distance gets higher.

For your case query "a b c" will match with document "a b c" with distance 0. This will result to highest phrase score. For document "b a c" distance will be more than 0. So Score will be less.

For more details look at source code of org.apache.lucene.search.SloppyPhraseScorer Class.

查看更多
登录 后发表回答