Copy Solr HDFS Data to another Cluster

2019-07-13 13:40发布

问题:

I have a solr cloud (v 4.10) installation that sits on top of Cloudera (CDH 5.4.2) HDFS with 3 solr instances each hosting a shard of each core. I am looking for a way to incrementally copy the solr data from our production cluster to our development cluster. There are 3 cores but I am only interested in copying one of them.

I have tried to use the Solr replication - backup and restore but that doesn't seem to load anything into the dev cluster.

http://host:8983/solr/core/replication?command=backup&location=/solr_transfer&name=core-name
http://host:8983/solr/core/replication?command=restore&location=/solr_transfer&name=core-name

I also tried to snapshot the /solr dir in the hdfs prod clusters and use hadoop disctp to copy the files but the solr indexer deletes some of the files so the distcp job fails.

hadoop distcp hftp://prod:50070/solr/* hdfs://dev:8020/solr/

Can anyone help me here?

回答1:

After a lot of trying this is the solution we worked out. - Initialise solr in the second environment with all the collections in the same way as the primary. - Take a snapshot of HDFS - Use hadoop hdfs -cp to copy the data up to the checkpoint After the first run the copy job will be quick as you are only copying the increments.



回答2:

please follow below steps to create snapshot of solr_hdfs folder and move the same on another cluster

1.Allow snapshot

sudo -u hdfs hadoop dfsadmin -allowSnapshot /user/solr/SolrCollectionName

2.Create snapshot with a specific name

sudo -u hdfs hadoop dfs -createSnapshot /user/solr/SolrCollectionName/ snapshotName

3. To list to snapshot directory

hdfs dfs -ls /user/solr/solrcollectionName/.snapshot

4. To copy, execute below command

 sudo -u solr hadoop distcp hdfs://NNIP1:8020/user/solr/collectionName/.snapshot/SanpshotName  hdfs://NNIP2:8020/user/solr

5. To restore snapshot

sudo -u solr hadoop fs -cp /user/solr/SanpshotName/* /user/solr/SolrcollectionName/