I want to parse some text using Lucene query parser to carry out basic text preprocessing on the texts. I used following lines of code:
Analyzer analyzer = new EnglishAnalyzer();
QueryParser parser = new QueryParser("", analyzer);
String text = "...";
String ret = parser.parse(QueryParser.escape(text)).toString();
But, I am getting an error:
Exception in thread "main" org.apache.lucene.queryparser.classic.ParseException: Cannot parse '': Encountered "<EOF>" at line 1, column 0.
for those who face this problem, I realized that my parser throw exception for the word "NOT", even after escaped. I had to manually replace it by other word.
Using
Query.escape()
removes the special characters. However it doesn't removewhich are keywords used in lucene search.
There are two ways to deal with it :
Converting to lower case resolves the issue as only the capitalized AND, NOT, OR are keywords. They are treated as a regular word in lower case.