I have a large Apache Jena TDB, I want to build a Lucene index using Apache Jena 2.10.2 for use with the new text search feature. I find the documentation hard to follow.
I first tried to use configuration in code, but had trouble with the dependencies. Any combination of lecene-core and solr-solrj would either result in certain 'classNotFound' errors or a 'StandardAnalyzer overrides final method tokenStream' error. Example of Code:
Dataset ds1 = DatasetFactory.createMem() ;
EntityDefinition entDef = new EntityDefinition("uri", "text", RDFS.label) ;
Directory dir = new RAMDirectory();
// Have also tried creating the index in a file
File indexDir = new File("luceneIndexes");
Directory dir = FSDirectory.open(indexDir);
// Fails on this line
Dataset ds = TextDatasetFactory.createLucene(ds1, dir, entDef) ;
I think the only solution may be to create an Text Dataset Assembler, but if anyone has advice on creating this in code I would prefer to do it that way.