How to sort by Lucene.Net field and ignore common

2019-06-01 01:44发布

I've found how to sort query results by a given field in a Lucene.Net index instead of by score; all it takes is a field that is indexed but not tokenized. However, what I haven't been able to figure out is how to sort that field while ignoring stop words such as "a" and "the", so that the following book titles, for example, would sort in ascending order like so:

  1. The Cat in the Hat
  2. Horton Hears a Who

Is such a thing possible, and if yes, how?

I'm using Lucene.Net 2.3.1.2.

5条回答
倾城 Initia
2楼-- · 2019-06-01 02:11

There seems to be a catch-22 in that you must tokenize a field with an analyzer in order to strip punctuation and stop words, but you can't sort on tokenized fields. How then to strip the stop words without tokenizing?

查看更多
\"骚年 ilove
3楼-- · 2019-06-01 02:13

It's been a while since I used Lucene but my guess would be to add an extra field for sorting and storing the value in there with the stop words already stripped. You can probably use the same analyzers to generate this value.

查看更多
聊天终结者
4楼-- · 2019-06-01 02:14

I wrap the results returned by Lucene into my own collection of custom objects. Then I can populate it with extra info/context information (and use things like the highlighter class to pull out a snippet of the matches), plus add paging. If you took a similar route you could create a "result" class/object, add something like a SortBy property and grab whatever field you wanted to sort by, strip out any stop words, then save it in this property. Now just sort the collection based on that property instead.

查看更多
祖国的老花朵
5楼-- · 2019-06-01 02:22

For search, I found search lucene .net index with sort option link interesting to solve ur problem

查看更多
叛逆
6楼-- · 2019-06-01 02:26

When you create your index, create a field that only contains the words you wish to sort on, then when retrieving, sort on that field but display the full title.

查看更多
登录 后发表回答