I need to update large amount of documents in solr very often. For example, set "online" = true for user_id = 5 and so on. But speed of indexation via http handler is very slow. Solr support delete documents by query, is there way to update by query?
问题:
回答1:
No, unfortunately there isn't any feature like update by query. It would be really useful, like a new feature to make possible updating a document without the need to resubmit it entirely; there's a 5 years old jira issue for that. For now you should just re-submit your documents with the updated fields, they will be overwritten (it means deleted + re-inserted) if you use the same uniqueKey
.
By the way, are you making an http request for each document to update? If yes, you can make it faster submitting more than one document at a time like this:
<add>
<doc>
<field name="employeeId">05991</field>
<field name="office">Bridgewater</field>
</doc>
<doc>
<field name="employeeId">05992</field>
<field name="office">Bridgewater</field>
</doc>
<doc>
<field name="employeeId">05993</field>
<field name="office">Bridgewater</field>
</doc>
</add>
回答2:
There's still no update by query, but the answers from 2012 are out of date. Now in Solr 4.x there are https://wiki.apache.org/solr/Atomic_Updates so you can do what you want to do in two steps without requiring access to the original document.
回答3:
As javanna answered, there is not any facility to update by query, as Solr also does not allow you update individual fields in a document stored in the index, so a re-submit is the only method of updating. I am curious though as to why your updates are so slow. Below are a few ways that you could improve the update speed.
If you issuing a commit after updating each individual document, then wait and only issue the update after you have updated a batch of documents in the index. From the Solr Tutorial:
Commit can be an expensive operation so it's best to make many changes to an index in a batch and then send the commit command at the end. There is also an optimize command that does the same thing as commit, in addition to merging all index segments into a single segment, making it faster to search and causing any deleted documents to be removed.
Look at using soft commits or auto soft commits to reduce the update latency. Please refer to the NearRealtimeSearch page on the Solr Wiki for more details.
回答4:
You can develop a minimal Solr plugin which will do the work for you on the solr server side.
Have a look at:
Discussion on Solr mailing list
回答5:
I would use DIH with modified SQL query that will accept parameters from URL. SQL query will look like:
SELECT user_name, user_online FROM users WHERE user_id=${dataimporter.request.user_id}
Then to reindex selected users you are adding user_id parameter to URL like that:
http://<host>:<port>/solr/dataimport?command=full-import&clean=false&user_id=5
Docs about using DIH and custom parameters: Solr - DataImportHandler