Why does an index update take so much time in Solr

2019-04-16 04:48发布

问题:

I am using solr 4.10.3. I have to update about 100,000 indexes. Queries are like per document

curl 'localhost:8900/solr/update?commit=true' -H 'Content-type:application/json' -d '[{"id":"org.wikipedia.ur:http/wiki/%D9%85%DB%8C%D9%84","group":{"set":"wiki"}}]'

After starting to update this index, after 12 hours passed, only 48000 documents are updated.

Where is the problem. Can anyone guide me?

回答1:

You are using hard commit with each curl request. This forces solr to push the segment (lucene data structure for storing the indexes) to disk with every commit. Solr always write the data into new segments and looks like it force it to create 100K segments.

Solr uses mergePolicy as TieredMergePolicy and mergeFactor as 10 default parameter which merge every time solr have 10 almost equal size segments. This merge processes runs in the background using with ConcurrentMergeScheduler implementation.

This merge process is CPU intensive. Here you can use softCommit instead of hardCommit. That might help you.



回答2:

You should use soft commit like this one

curl 'localhost:8900/solr/update?softCommit=true' -H 'Content-type:application/json' -d '[{"id":"org.wikipedia.ur:http/wiki/%D9%85%DB%8C%D9%84","group":{"set":"wiki"}}]'