what is the best lucene setup for ranking exact ma

2019-04-07 17:18发布

Which analyzers should be used for indexing and for searching when I want an exact match to rank higher then a "partial" match? Possibly set up custom scoring in a Similarity class?

For example, when my index consist of car parts, car, and car shop (indexed with StandardAnalyzer on lucene 3.5), a query for "car" results in:

car parts
car
car shop

(basically returned in the order in which they were added, since they all get the same score).

What I would like to see is car ranked first, then the other results (doesn't really matter which order, I assume the analyzer can influence that).

标签： java lucene analyzer

2条回答

▲ chillily

2楼-- · 2019-04-07 17:56

All three matches are exact (term car being matched, not 'ca' or 'ar') :)

If there's no more content in these fields ("car parts", "car" and "car shop"), then you could use lengthNorm() or computeNorm() (depending on Lucene version), to give shorter fields more weight so that car gets higher score for being shorter. In Lucene 3.3.0, DefaultSimilarity.computeNorm() looks like this:

return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms)));

where numTerms is the total number of terms in the field. So it's surprising "car" and "car shop" documents have the same score, because for "car" the norm is 1 and for "car shop" it should be 0.7 (assuming boost of 1).

0人赞添加讨论(0) 举报

Emotional °昔

3楼-- · 2019-04-07 18:03

Quick hack: after getting the ScoreDoc[] from IndexSearcher.search, re-sort it with score as the first criterion and length (ascending) as the second.

0人赞添加讨论(0) 举报

what is the best lucene setup for ranking exact ma

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间