Lucene scoring: in what context is queryNorm used?

2019-04-08 18:30发布

I am a little confused by the lucene scoring strategy. I know that Lucene's scoring formula is like:

score(q,d) = coord(q,d) x queryNorm(q) X SUM <t_in_q> ( tf(t_in_d) x idf(t)^2 x t.getBoost() x norm(t,d))

I understand every component in this formula except queryNorm(q). As explained by the official documentation,

queryNorm(q) is a normalizing factor used to make scores between queries comparable. This factor does not affect document ranking (since all ranked documents are multiplied by the same factor), but rather just attempts to make scores from different queries (or even different indexes) comparable.

Why do I need to compare scores between different queries? In another word, could you give an example to show in which context queryNorm(q) is useful?

2条回答
Rolldiameter
2楼-- · 2019-04-08 18:43

Good question, I've wondered this myself. According to this ScoresAsPercentages argument, attempting to compare different queries or indexes scores, or even scores on the same query and index at different times, is a bad idea, and I agree.

My understanding is that, while queryNorm really doesn't make them strictly comparable, it does help. They are closer to comparable with the Default queryNorm than without.

I suppose it could also enable people to write their own similarity, and use this call to create normalized, comparable scores, using algorithms that work in their particular case.

There has been some discussion on dropping it, which you might find interesting.

查看更多
孤傲高冷的网名
3楼-- · 2019-04-08 18:43

I know the question is old but I had a similar problem. The reason why queryNorm was not the same on all search results is that documents can be in different shards and the queryNorm is constant only within the same shard.

From my understanding this problem can be solved in 2 ways:

  • naturally, when there is a lot of data

  • setting the number of shards to 1. Of couse this have consequences on performances.

    { "settings": { "number_of_shards" : 1 } }

See http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/relevance-is-broken.html

查看更多
登录 后发表回答