I'm upgrading from Lucene 3.6 to Lucene 4.0-beta. In Lucene 3.x, the IndexReader
contains a method IndexReader.getTermFreqVectors()
, which I can use to extract the frequency of each term in a given document and field.
This method is now replaced by IndexReader.getTermVectors()
, which returns Terms
. How can I make use of this (or probably other methods) to extract the term frequency in a document and a field?
There is various documentation on how to use the flexible indexing apis:
Accessing the Fields/Terms for a documents term vectors is the exact same API you use for accessing the postings lists, since term vectors are really just a miniature inverted index for just that one document.
So its perfectly OK to use all those examples as-is, though you can make some shortcuts since you know there is only ever one document in this "miniature inverted index". e.g. if you just want to get the frequency of a term you can just seek to it and use the aggregate statistics like totalTermFreq (see https://builds.apache.org/job/Lucene-Artifacts-4.x/javadoc/core/org/apache/lucene/index/package-summary.html#stats), rather than actually opening a DocsEnum that will only enumerate over a single document.
Perhaps this will help you:
I have this working on my Lucene 4.2 index. This is a small test program that works for me.
See this related question, specificially