I have a search string ,
Tulip INN Riyadhh
Tulip INN Riyadhh LUXURY
Suites of Tulip INN RIYAHdhh
I need search term , if i mention
*Tulip INN Riyadhh*
it has to return all the three above, i have restriction that i have to achieve this without QueryParser or Analyser, it has to be only BooleanQuery/WildCardQuery/etc....
Regards, Raghavan
What you need here is a
PhraseQuery
. Let me explain.I don't know which analyzer you're using, but I'll suppose you have a very basic one for simplicity, that just converts text to lowercase. Don't tell me you're not using an anlayzer since it's mandatory for Lucene to do any work, at least at the indexing stage - this is what defines the tokenizer and the token filter chain.
Here's how your strings would be tokenized in this example:
tulip
inn
ryiadhh
tulip
inn
ryiadhh
luxury
suites
of
tulip
inn
ryiadhh
Notice how these all contain the token sequence
tulip
inn
ryiadhh
. A sequence of tokens is what aPhraseQuery
is looking for.In Lucene.Net building such a query looks like this (untested):
Note that the terms need to match those produced by the analyzer (in this example, they're all lowercase). The
QueryParser
does this job for you by running parts of the query through the analyzer, but you'll have to do it yourself if you don't use the parser.Now, why wouldn't
WildcardQuery
orRegexQuery
work in this situation? These queries always match a single term, yet you need to match an ordered sequence of terms. For instance aWildcardQuery
with the termRiyadhh*
would find all words starting withRiyadhh
.A
BooleanQuery
with a collection ofTermQuery
MUST
clauses would match any text that happens to contain these 3 terms in any order - not exactly what you want either.Lucas has the right idea, but there is a more specialized
MultiPhraseQuery
that can be used to build up a query based on the data that is already in the index to get a prefix match as demonstrated in this unit test. The documentation ofMultiPhraseQuery
reads:As Lucas pointed out, a
*something
WildCardQuery
is the way to do the suffix match, provided you understand the performance implications.They can then be combined with a
BooleanQuery
to get the result you want.WriteIndex
GetPrefixTerms
Here we scan the index to find all of the terms that start with the passed-in prefix. The terms are then added to the
MultiPhraseQuery
.ExecuteSearch
SearchResult
If this seems cumbersome, note that
QueryParser
can mimic a "SQL LIKE" query. As pointed out here, there is an option toAllowLeadingWildCard
onQueryParser
to build up the correct query sequence easily. It is unclear why you have a constraint that you can't use it, as it is definitely the simplest way to get the job done.