How can Lucene's scoring depend on relative po

I use WhitespaceAnalyzer as query analyzer. If I have 2 documents:

| text | a b c |
| text | b a c |

text is a field.

Now the index structure is something like this:

|Term|  in document | 
| a  | a b c / b a c|
| b  | a b c / b a c|
| c  | a b c / b a c|

And I have a query:

| text | a b c |

How can I get a higher score for a b c and a lower one for b a c.

Does Lucene support calculating score depending on relative position?

I found that I found this would help:

PhraseQuery phraseQuery = new PhraseQuery();
phraseQuery.setSlop(1);

In this way they would get different scores.

See more: http://www.blogjava.net/tangzurui/archive/2008/09/22/230357.html

And here I come across another question: https://stackoverflow.com/questions/18394532/how-can-lucenes-scoring-depend-on-terms-relative-position-in-the-document

标签： java search lucene

2条回答

家丑人穷心不美

2楼-- · 2019-08-08 18:30

It depends on, which type of query you use. Some query could get more score, if phrase that you search is placed in correct order (e.g. new york or york new). According to Lucene documentation, you could use explanation of score, to see, why A B C is geting higher score than B A C.

Scoring is very much dependent on the way documents are indexed, so it is important to understand indexing (see Apache Lucene - Getting Started Guide and the Lucene file formats before continuing on with this section.) It is also assumed that readers know how to use the Searcher.explain(Query query, int doc) functionality, which can go a long way in informing why a score is returned.

http://lucene.apache.org/core/3_6_2/scoring.html

UPD. For storing position of terms look at this, if you using Lucene 3 http://lucene.apache.org/core/3_0_3/api/core/org/apache/lucene/document/Field.TermVector.html

0人赞添加讨论(0) 举报

小情绪 Triste *

3楼-- · 2019-08-08 18:31

The score contribution of a phrase match depends on the distance:

highest score for distance=0 (exact match).
score gets lower as distance gets higher.

For your case query "a b c" will match with document "a b c" with distance 0. This will result to highest phrase score. For document "b a c" distance will be more than 0. So Score will be less.

For more details look at source code of org.apache.lucene.search.SloppyPhraseScorer Class.

0人赞添加讨论(0) 举报

How can Lucene's scoring depend on relative po

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间