First of all I am a total noob when it comes to Tika and Lucene. I am working through the Tika in Action book trying out the examples. In chapter 5 this example is given:
package tikatest01;
import java.io.File;
import org.apache.tika.Tika;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.index.IndexWriter;
public class LuceneIndexer {
private final Tika tika;
private final IndexWriter writer;
public LuceneIndexer(Tika tika, IndexWriter writer) {
this.tika = tika;
this.writer = writer;
}
public void indexDocument(File file) throws Exception {
Document document = new Document();
document.add(new Field(
"filename", file.getName(),
Store.YES, Index.ANALYZED));
document.add(new Field(
"fulltext", tika.parseToString(file),
Store.NO, Index.ANALYZED));
writer.addDocument(document);
}
}
And this main method:
package tikatest01;
import java.io.File;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.tika.Tika;
public class TikaTest01 {
public static void main(String[] args) throws Exception {
String filename = "C:\\testdoc.pdf";
File file = new File(filename);
IndexWriter writer = new IndexWriter(
new SimpleFSDirectory(file),
new StandardAnalyzer(Version.LUCENE_30),
MaxFieldLength.UNLIMITED);
try {
LuceneIndexer indexer = new LuceneIndexer(new Tika(), writer);
indexer.indexDocument(file);
}
finally {
writer.close();
}
}
}
I've added the libraries tika-app-1.5.jar, lucene-core-4.7.0.jar and lucene-analyzers-common-4.7.0.jar to the project.
Questions:
With the current version of Lucene the Field.Index is deprecated, what should I use instead?
MaxFieldLength is not found. I am missing an import?
For Lucene 4.7 this code for the indexer:
And this code for the main class:
For Lucene 4.7 there isn't this kind of constructor for IndexWriter Take a look on API - http://lucene.apache.org/core/4_7_0/core/org/apache/lucene/index/IndexWriter.html
It show me only constructor with 2 params, so you need to adopt this example to new Lucene API