Given a finite dictionary of entity terms, I'm looking for a way to do Entity Extraction with intelligent tagging using Lucene. Currently I've been able to use Lucene for:
- Searching for complex phrases with some fuzzyness
- Highlighting results
However, I 'm not aware how to:
-Get accurate offsets of the matched phrases
-Do entity-specific annotaions per match(not just tags for every single hit)
I have tried using the explain() method - but this only gives the terms in the query which got the hit - not the offsets of the hit within the original text.
Has anybody faced a similar problem and is willing to share a potential solution?
Thank you in advance for you help!
For the offset, see this question: How get the offset of term in Lucene?
I don't quite understand your second question. It sounds to me like you want to get the data from a stored field though. To get the data from a stored field: