What is the best way to find out which terms in a query matched against a given document returned as a hit in lucene?
I have tried a weird method involving hit highlighting package in lucene contrib and also a method that searches for every word in the query against the top most document ("docId: xy AND description: each_word_in_query").
Do not get satisfactory results? Hit highlighting does not report some of the words that matched for a document other than the first one. I'm not sure if the second approach is the best alternative.
The method explain in the Searcher is a nice way to see which part of a query was matched and how it affects the overall score.
Example taken from the book Lucene In Action 2nd Edition:
This will explain the score of each document that matches the query.
Not tried yet, but have a look at the implementation of org.apache.lucene.search.highlight.QueryTermExtractor.