I am using solr 4.10.3. I have to update about 100,000 indexes. Queries are like per document
curl 'localhost:8900/solr/update?commit=true' -H 'Content-type:application/json' -d '[{"id":"org.wikipedia.ur:http/wiki/%D9%85%DB%8C%D9%84","group":{"set":"wiki"}}]'
After starting to update this index, after 12 hours passed, only 48000 documents are updated.
Where is the problem. Can anyone guide me?
You are using hard commit with each curl request. This forces solr to push the segment (lucene data structure for storing the indexes) to disk with every commit. Solr always write the data into new segments and looks like it force it to create 100K segments.
Solr uses mergePolicy as TieredMergePolicy and mergeFactor
as 10 default parameter which merge every time solr have 10 almost equal size segments. This merge processes runs in the background using with ConcurrentMergeScheduler implementation.
This merge process is CPU intensive. Here you can use softCommit
instead of hardCommit
. That might help you.
You should use soft commit like this one
curl 'localhost:8900/solr/update?softCommit=true' -H 'Content-type:application/json' -d '[{"id":"org.wikipedia.ur:http/wiki/%D9%85%DB%8C%D9%84","group":{"set":"wiki"}}]'