Does SolrCloud's scalability extend to indexin

2019-07-26 22:44发布

问题:

In all the literature I've seen, the scalability of SolrCloud appears to concern querying only. Meaning, replication and sharding distributes the load of client queries accross greater CPU and wider bandwidth.

But what about Indexing?

Does SolrCloud's scalability improving index performance? Can it be configured to speed up index time? In my case, we need to commit new content to the index frequently; does that special case change anything.

Mark Miller's presentation from Lucene Revolution 2012 is fascinating and covers some details of indexing. But it seems that certain cloud features (like replication) could conceivably make indexing slower, not faster. Anyone tried SolrCloud?

回答1:

Well, I am finally able to set up a proper cloud environment for testing and briefly, indexing speed is doomed even with RAMDirectory. I dont know if the indexing speed could be related the number of followers in cloud or number of collections, but having 1 leader 2 follower structure with 8 collections makes indexing 4 to 5 times slower. I am able to index around 3.5M docs in 17 minutes while with the same configs for each instance in the cloud, i can only index 650K docs in 17 minutes... I am not sure how to speed up SolrCloud indexing speed and some kinda surprised see that my expectations about cloud is destroyed one by one as I keep getting new bugs and problems while working on it.

If this is happening on any other settings too, I dont understand what is the point of using cloud for Solr. I mean if indexing speed is rising this much, i can reindex everything on a classical standalone solr instance much faster.

Seeing some other experiences with SolrCloud would be really nice, if anyone tried it or anyone has it on a real environment



回答2:

Which version of solr you are using for solr cloud? Solr cloud is very stable since solr 4.8 release.

  1. You can increase the indexing speed by not hard committing documents frequently instead commit in batches i.e. after 45 or 60s. This can be achieved by the auto commit configuration in solr config -

  2. While hard commit ensures that that data is flushed to stable storage however it does not makes the changes visible which is achieved by soft commit tag. Set a value of soft commit to be around 90-120s. This alos can be achieved by a soft commit configuration in solr config -