Lucene Entity Extraction

2019-05-06 19:44发布

Given a finite dictionary of entity terms, I'm looking for a way to do Entity Extraction with intelligent tagging using Lucene. Currently I've been able to use Lucene for:
- Searching for complex phrases with some fuzzyness
- Highlighting results

However, I 'm not aware how to:
-Get accurate offsets of the matched phrases
-Do entity-specific annotaions per match(not just tags for every single hit)

I have tried using the explain() method - but this only gives the terms in the query which got the hit - not the offsets of the hit within the original text.

Has anybody faced a similar problem and is willing to share a potential solution?

Thank you in advance for you help!

标签： lucene text-mining information-extraction lucene-highlighter

1条回答

SAY GOODBYE

2楼-- · 2019-05-06 20:12

For the offset, see this question: How get the offset of term in Lucene?

I don't quite understand your second question. It sounds to me like you want to get the data from a stored field though. To get the data from a stored field:

TopDocs results = searcher.Search(query, filter, num);
foreach (ScoreDoc result in results.scoreDocs)
{
    Document resultDoc = searcher.Doc(result.doc);
    string valOfField = resultDoc.Get("My Field");
}

0人赞添加讨论(0) 举报

Lucene Entity Extraction

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间