I have a particular SQL join such that:
select DISTINCT ... 100 columns
from ... 10 tabes, some left joins
Currently I export the result of this query to XML using Toad (I'll query it straight from Java later). I use Java to parse the XML file, and I use Lucene (Java) to index it and to search the Lucene index. This works great: I get results 6-10 times faster than querying it from the database.
I need to think of a way to incrementally update this index when the data in the database changes.
Because I am joining tables (especially left joins) I'm not sure I can get a unique business key combination to do an incremental update. On the other hand, because I am using DISTINCT, I know that every single field is a unique combination. Given this information, I thought I could put the hashCode
of a document as a field of the document, and call updateDocument
on the IndexWriter
like this:
public static void addDoc(IndexWriter w, Row row) throws IOException {
//Row is simply a java representation of a single row from the above query
Document document = new Document();
document.add(new StringField("fieldA", row.fieldA, Field.Store.YES));
...
String hashCode = String.valueOf(document.hashCode());
document.add(new StringField("HASH", hashCode, Field.Store.YES));
w.updateDocument(new Term("HASH", hashCode), document);
}
Then I realized that updateDocument
was actually deleting the document with the matching hash code and adding the identical document again, so this wasn't of any use.
What is the way to approach this?