How to optimize solr index

2019-02-12 13:38发布

问题:

How to optimize solr index. I want to optimize my solr indexing for i try to change in solrconfig.xml it getting indexed but i want to how to verify that they are optimized and with which thing are involve in index optimization.

回答1:

I find this to be the easiest way to optimize a Solr index. In my context "optimize" means to merge all index segments.

curl http://localhost:8983/solr/<core_name>/update -F stream.body=' <optimize />'


回答2:

Check the size of respective core before you start.

Open Terminal 1:

watch -n 10 "du -sh /path to core/data/*"

Open Terminal 2 and Execute:

curl http://hostname:8980/solr/<core>/update?optimize=true

Instead of "core", update your respective name of the core.

You could see the size of the core will increase gradually about double the size of your indexed data and will reduce suddenly. This will take time depends on your solr data.

For instance, 50G indexed data spikes nearly 90G and downs to optimized 25G data. And normally it will take 30-45min for this amount of data.

Why doesn't my index directory get smaller (immediately) when i delete documents? force a merge? optimize?



回答3:

You need to pass optimize=true to update solr request to optimize the solr.

http://[HostName]:[port]/solr/update?optimize=true



回答4:

There are different ways to optimize an index. You could trigger one of the solr basic scripts: http://wiki.apache.org/solr/SolrOperationsTools#optimize

You also could set optimize=true at an (full) import or while adding new data. ...or simply trigger an commit with optimize=true

Maybe also this could be interesting for your needs: http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22



回答5:

For testing how much a change you do optimize the indexing, just write a custom indexer and add random generated content. Add a big number of documents (500.000 or 1.000.000) and measure the time it takes.

Following the articles shared above I made to myself a custom indexer and I mananged to optimize the time it took to index documents by 80%.