Getting the Doc ID in Lucene

2020-04-11 18:32发布

In lucene, I can do the following

doc.GetField("mycustomfield").StringValue();

This retrieves the value of a column in an index's document.

My question, for the same 'doc', is there a way to get the Doc. Id ? Luke displays it hence there must be a way to figure this out. I need it to delete documents on updates.

I scoured the docs but have not found the term to use in GetField or if there already is another method.

2条回答
淡お忘
2楼-- · 2020-04-11 18:55

Turns out you have to do this:

var hits = searcher.Search(query);
var result = hits.Id(0);

As opposed to

var results = hits.Doc(i);
var docid = results.<...> //there's nothing I could find there to do this
查看更多
够拽才男人
3楼-- · 2020-04-11 18:57

I suspect the reason you're having trouble finding any documentation on determining the id of a particular Lucene Document is because they are not truly "id"s. In other words, they are not necessarily meant to be looked up and stored for later use. In fact, if you do, you will not get the results you were hoping for, as the IDs will change when the index is optimized.

Instead, think of the IDs as the current "offset" of a particular document from the start of the index, which will change when deleted documents are physically removed from the index files.

Now with that said, the proper way to look up the "id" of a document is:


QueryParser parser = new QueryParser(...);
IndexSearcher searcher = new IndexSearcher(...);
Hits hits = searcher.Search(parser.Parse(...);

for (int i = 0; i < hits.Length(); i++)
{
   int id = hits.Id(i);

   // do stuff
}
查看更多
登录 后发表回答