Lucene error while parsing Query: Cannot parse 

2019-07-31 16:39发布

问题:

I want to parse some text using Lucene query parser to carry out basic text preprocessing on the texts. I used following lines of code:

Analyzer analyzer = new EnglishAnalyzer();
QueryParser parser = new QueryParser("", analyzer);
String text = "...";
String ret = parser.parse(QueryParser.escape(text)).toString();

But, I am getting an error:

Exception in thread "main" org.apache.lucene.queryparser.classic.ParseException: Cannot parse '': Encountered "<EOF>" at line 1, column 0.

回答1:

Using Query.escape() removes the special characters. However it doesn't remove

AND, NOT, OR

which are keywords used in lucene search.

There are two ways to deal with it :

  1. Replace AND, NOT, OR in the query string.
  2. Convert the query string to lower case.

Converting to lower case resolves the issue as only the capitalized AND, NOT, OR are keywords. They are treated as a regular word in lower case.



回答2:

for those who face this problem, I realized that my parser throw exception for the word "NOT", even after escaped. I had to manually replace it by other word.