Keyword (OR, AND) search in Lucene

2019-03-09 20:24发布

I am using Lucene in my portal (J2EE based) for indexing and search services.

The problem is about the keywords of Lucene. When you use one of them in the search query, you'll get an error.

For example:

searchTerms = "ik OR jij"

This works fine, because it will search for "ik" or "jij"

searchTerms = "ik AND jij"

This works fine, it searches for "ik" and "jij"

But when you search:

searchTerms = "OR"
searchTerms = "AND"
searchTerms = "ik OR"
searchTerms = "OR ik"

Etc., it will fail with an error:

Component Name: STSE_RESULTS  Class: org.apache.lucene.queryParser.ParseException  Message: Cannot parse 'OR jij': Encountered "OR" at line 1, column 0. 
Was expecting one of: 
... 

It makes sense, because these words are keywords for Lucene are probably reserved and will act as keywords.

In Dutch, the word "OR" is important because it has a meaning for "Ondernemings Raad". It is used in many texts, and it needs to be found. For example "or" does work, but does not return texts matching the term "OR". How can I make it searchable?

How can I escape the keyword "or"? Or How can I tell Lucene to treat "or" as a search term NOT as a keyword.

标签: java lucene
6条回答
ら.Afraid
2楼-- · 2019-03-09 20:49

You can escape the "OR" when it's a search term, or write your own query parser for a different syntax. Lucene offers an extensive query API in addition to the parser, with which you support your own query syntax quite easily.

查看更多
Root(大扎)
3楼-- · 2019-03-09 20:50

I have read your question many times! =[

please look at these suggestions

How is your index stored?

Document containing Fields stored can be stored as

1)Stored 2)Tokenized 3)Indexed 4)Vector

it can make a significant difference

please use Luke, it can tell you how your indexes are stored(actually)

Luke is a must have if you are working with lucene, as it gives you a real idea of how indexes are stored,it also offers search, try it let us know with your update!

查看更多
时光不老,我们不散
4楼-- · 2019-03-09 21:08

Escaping OR and AND with double quotes works for me. So try with a Java string like

String query = "field:\"AND\"";

查看更多
Viruses.
5楼-- · 2019-03-09 21:08

You're probably doing something wrong when you're building the query. I'll second Narayan's suggestion on getting Luke (as posted in the comments) and try running your queries with that. It has been a little while since I used Lucene, but I don't remember ever having issues with OR and AND.

Other than that, you can try escaping the input strings using QueryParser.escape(userQuery)

More On Escaping

查看更多
放荡不羁爱自由
6楼-- · 2019-03-09 21:09

OR, NOT and AND are reserved keywords. I solved this problem just 2 days ago by lower-casing those 3 words in the user's search term before feeding it into the lucene query parser. Note that if you search and replace for these keywords make sure you use word boundaries (\b) so you don't end up changing words such as ANDROID and ORDER.

I then let the user specify NOT and AND by using - and +, just like Google does.

查看更多
可以哭但决不认输i
7楼-- · 2019-03-09 21:13

I suppose you have tried putting the "OR" into double quotes?

If that doesn't work I think you might have to go so far as to change the Lucene source and then recompile the whole thing, as the operator "OR" is buried deep inside the code. Actually, compiling probably isn't even enough: you'll have to change the file QueryParser.jj in the source package that serves as input for JavaCC, then run JavaCC, then recompile the whole thing.

The good news, however, is that there's only one line to change:

| <OR: ("OR" | "||") >

becomes

| <OR: ("||") >

That way, you'll have only "||" as logical OR operator. There is a build.xml that also contains the invocation of JavaCC, but you have to download that tool yourself. I can't try it myself right now, I'm afraid.

This is perhaps a good question for the Lucene developer mailing list, but please let us know if you do that and they come up with a simpler solution ;-)

查看更多
登录 后发表回答