Lucene.net and partial “starts with” phrase search

2019-05-10 01:03发布

I'm looking to build an auto-complete textbox over a large quantity of city names. Search functionality is as follows: I want a "Starts with" search over a multi-word phrase. For example, if user has typed in "chicago he", only locations such as "Chicago Heights" need to be returned.
I'm trying to use Lucene for this. I'm having issues understanding how this needs to be implemented.

I've tried what I think is the approach that should work:

I've indexed locations with KeywordAnalyzer (I've tried both TOKENIZED and UN_TOKENIZED):

doc.Add(new Field("Name", data.ToLower(), Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.NO));

And search for them via the following (I've also tried a variety of other queries/analyzers/etc):

var luceneQuery = new BooleanQuery();
var wildcardQuery = new WildcardQuery(new Term("Name", "chicago hei*"));
luceneQuery.Add(wildcardQuery, BooleanClause.Occur.MUST);

I'm not getting any results. Would appreciate any advice.

2条回答
我只想做你的唯一
2楼-- · 2019-05-10 01:32

The only way to guarantee a "starts with" search is to put a delimiter at the beginning of the indexed string, so "diamond ring" is indexed like "lucenedelimiter diamond ring lucenedelimiter". This prevents a search turning up "the famous Diamond Ridge Resort" from turning up in a search for "diamond ri*".

查看更多
太酷不给撩
3楼-- · 2019-05-10 01:49

To do that you need to index your field with the Field.Index.NOT_ANALYZED setting, which is the same as the UN_TOKENIZED you use, so it should work. Heres a working sample I quickly made up to test. Im using the latest version available on Nuget

IndexWriter iw = new IndexWriter(@"C:\temp\sotests", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29), true);

Document doc = new Document();
Field loc = new Field("location", "", Field.Store.YES, Field.Index.NOT_ANALYZED);
doc.Add(loc);

loc.SetValue("chicago heights");
iw.AddDocument(doc);

loc.SetValue("new-york");
iw.AddDocument(doc);

loc.SetValue("chicago low");
iw.AddDocument(doc);

loc.SetValue("montreal");
iw.AddDocument(doc);

loc.SetValue("paris");
iw.AddDocument(doc);

iw.Commit();


IndexSearcher ins = new IndexSearcher(iw.GetReader());

WildcardQuery query = new WildcardQuery(new Term("location", "chicago he*"));

var hits = ins.Search(query);

for (int i = 0; i < hits.Length(); i++)
    Console.WriteLine(hits.Doc(i).GetField("location").StringValue());

Console.WriteLine("---");

query = new WildcardQuery(new Term("location", "chic*"));
hits = ins.Search(query);

for (int i = 0; i < hits.Length(); i++)
    Console.WriteLine(hits.Doc(i).GetField("location").StringValue());

iw.Close();
Console.ReadLine();
查看更多
登录 后发表回答