How to get more out of Lucene.net

I'm trying to incorporate Lucene.net in my web search.

Currently I have a lucene.net index that contains +1 million documents with 7 fields each. The last field is the "all" field that has the content of the previous fields concatenated. Searching the all field is just EXTREMELY fast :)

But I feel there is more to be found here. How can I make a search that searches one or more space separated strings over all the fields without using the "all" field?
I want to be able to give weights to certain fields. Furthermore it would be really nice if the search contained information on WHERE the hit took place so I can show it in the result.

I think this is all possible, but I don't immideatelly see how.
Any help?

标签： lucene.net

3条回答

再贱就再见

2楼-- · 2019-03-28 03:38

I don't think you need to maintain an "all" field.

Have a look into using a "MultiFieldQueryParser". Rather than taking a single default field to be used by the query parser, it accepts an array of field names (in addition to the index analyser).
Term boost should work as per "QueryParser" (i.e. no special action required). I should add that I've found the standard scoring seems OK for me (length of field, number of matches etc) without using boosted terms.
Lucene.Net (well, certainly the SVN 2.3 builds at the moment) includes a port of the Highlight package from the Java source. It does have a couple of quirks (not least of which is that it can be tricky to get going in the first place), but it basically works.

Good luck

0人赞添加讨论(0) 举报

我命由我不由天

3楼-- · 2019-03-28 03:42

We do something similar, the trick is to specify fields in your query string:

(+Tier1:ribbon^1)^4 OR (+Tier2:ribbon^1)^4 OR (+Tier3:ribbon^1) OR (+Tier4:q*ribbon*^1)^12

In the above example, the user searched for "ribbon" in our application. We have different segments of data in different fields, and the final field "Tier4" contains all the previous terms concatenated together. We prepend the field with a "q", so we can do leading wild-cards, also:

(+Tier4:q*ribbon*^1)^12

Lastly, we use boosts with the caret (^). This ends up weighting things differently. It took a while to get boosts right, and I'm still not 100% happy with them, but they do make a big impact.

0人赞添加讨论(0) 举报

男人必须洒脱

4楼-- · 2019-03-28 03:51

You have to get Lucene in Action. Although about original (that is Java) Lucene implementation, it contains all the information you need: about boosts, highlighters, qwery parsers, etc.

0人赞添加讨论(0) 举报

How to get more out of Lucene.net

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间