“”black lab*“ ”pet shop“”~5 in Lucene (proximity s

2019-05-29 20:00发布

How can I do a proximity search for two multi-word phrases in Lucene. For example, I want to find all black lab* (black labrador, black labradoodle, etc) withing 5 words of the phrase "pet shop". Which analyzer should I be using? Which query parser would be recommended? I'm working with Lucene.NET. I've ported the ComplexPhraseQueryParser from Java to C#, but that parser doesn't seem to be doing the trick (or perhaps I'm just using it wrong). I'm just getting started with Lucene, so your help is much appreciated.

2条回答
Rolldiameter
2楼-- · 2019-05-29 20:52

You can use a SpanQuery for this:

new SpanNearQuery(
    new SpanQuery[] {
        new SpanNearQuery(
            new SpanQuery[] {
                new SpanTermQuery(new Term(FIELD, "black")),
                new SpanMultiTermQueryWrapper<WildcardQuery>(new WildcardQuery(new Term(FIELD, "lab*"))),
            },
            0,
            true),
        new SpanNearQuery(
            new SpanQuery[] {
                new SpanTermQuery(new Term(FIELD, "pet")),
                new SpanTermQuery(new Term(FIELD, "shop")),
            },
            0,
            true),
    },
    5,
    true);

The default Lucene QueryParser doesn't support span queries, but you could try the Surround query parser. I couldn't find much else in the way of documentation.

You may also find this answer and this blog post useful.

查看更多
贼婆χ
3楼-- · 2019-05-29 21:04

You just need to set the slop.

查看更多
登录 后发表回答