I have a fairly simple need to do a conditional update in Solr, which is easily accomplished in MySQL.
For example,
- I have 100 documents with a unique field called
<id>
- I am POSTing 10 documents, some of which may be duplicate
<id>
s, in which case Solr would update the existing records with the same <id>
s
- I have a field called
<dateCreated>
and I would like to only update a <doc>
if the new <dateCreated>
is greated than the old <dateCreated>
(this applies to duplicate <id>
s only, of course)
How would I be able to accomplish such a thing?
The context is trying to combat race conditions resulting in multiple adds for the same ID but executing in the wrong order.
Thanks.
I can think of two ways:
- Write your own
UpdateHandler
and override addDoc
to implement that checking.
- Put the appropriate locks (critical sections) in your client code in order to fetch the stored document, compare the dates, and conditionally add the new document in a thread-safe manner.
Remember that Solr is not a database, comparing it to MySQL is comparing apples and oranges.
As of solr 4.0, optimistic concurrency is enabled via the _version_
field.
http://yonik.com/solr/optimistic-concurrency/
To enable, you need to make sure your schema.xml contains
<field name="_version_" type="long" indexed="true" stored="true"/>
and in solrconfig.xml
<updateHandler class="solr.DirectUpdateHandler2">
<updateLog>
<str name="dir">${solr.data.dir:}</str>
</updateLog>
</updateHandler>
With really custom addition logic like this, I find that writing your own client side updater works better. It keeps you from mucking around in Solr internals, which makes it easier to update in the future. You can definitly do this in SolrJ, but if you aren't a Java dev, there is probably a clientside library in your own preferred language... PHP, Python, Ruby, C# etc...
The rsolr Ruby gem (http://github.com/mwmitchell/rsolr/tree/master) makes it VERY easy to hack together a custom load script.