I'm currently testing bigcouch for big amounts of data (15 million records daily).
When I need to generate views of the data, I experience some balancing problems, because one of my two machines is much weaker than the other one. The result is, that the better machine is finished and has nothing to do while the weaker one has still a lot to do. (single- vs. dualcore)
My idea is now to move some shards from the weaker machine to the other one, so that they are finished at about the same time.
Therefore my question is, how can I move shards from the weeker bigcouch server to the better one?
Thank you for your help + best regards!
Andy
Bigcouch shards are simply CouchDB databases so the procedure for moving them is pretty simple. A future release of Bigcouch will automate the process but, for now, I'll just describe it.
A little background will help ground the explanation. A Bigcouch node is listening on two ports, 5984 and 5986. The front port, 5984, looks like CouchDB (while being clustered and fault-tolerant). The back port, 5986, talks directly to the underlying CouchDB server on a particular node. You will notice that there are two extra databases shown in localhost:5986/_all_dbs besides the shards of your database. One is called 'nodes' and you have already interacted with it when you set up your cluster. The other is called 'dbs' and contains a document for each clustered database, specifying where each copy of each shard of your database actually lives.
So, to move a shard, you need to do a few things;
- Identity the shard file.
- Copy the shard file to your new server.
- Tell Bigcouch about its new location.
- Top off with replication if needed.
Step 1
In the data directory of your Bigcouch node, you will find files like this;
shards/a0000000-bfffffff/foo.1312544893.couch
All shards are organized under the shards/ directory, then by range, and finally the name followed by a random number.
Select one of the files for your database and remember its name.
Step 2
Use any method to copy this file to the same path on your target server. rsync and scp are fine choices, as is CouchDB replication (be sure to replicate from port 5986 to port 5986).
Step 3
The document in 'dbs' that governs the layout of your clustered database needs to be modified. It looks a bit like this;
{"_id":"baz","_rev":"1-912fe2dd63e0a570a4ceb26fd742dffd","shard_suffix": [46,49,51,49,50,53,52,53,50,49,55],"changelog":[["add","00000000-7fffffff","dev1@127.0.0.1"],["add","80000000-ffffffff","dev1@127.0.0.1"]],"by_node":{"dev1@127.0.0.1":["00000000-7fffffff","80000000-ffffffff"]},"by_range":{"00000000-7fffffff":["dev1@127.0.0.1"],"80000000-ffffffff":["dev1@127.0.0.1"]}}
Update both the by_node and by_range values so that the shard you have moved resolves to the new host.
At this point you have moved the shard. However, if there have been updates since you started copying the file but before you updated the 'dbs' document, those writes happened at the original node and are not visible so you should proceed to step 4. If there have been no updates, you can delete the shard on the original server, though I recommend you check your database on port 5984 to be sure all your docs show up correctly.
Step 4
Perform a replication from the source shard to the target shard, again taking care to do this on the 5986 port of each. This will ensure that all updates are available once again. You can now delete the copy of this shard on the original server.
HTH,
Robert Newson - Cloudant.