I have a 2 node cassandra cluster with a replication factor of 2 and AutoBootStrap=true. Everything is good during startup and both nodes see each other. Let us call these nodes A and B.
- Add a set of keys and columns (lets call this set K1) to cassandra through node A.
- Connect to node A and read back set K1. Same on Node B. Success - Good
- Kill Cassandra process on Node B.
- Add set K2 through A.
- Connect to node A and read set K2. Good
- Restart Cassandra process on Node B.
- Try to read all keys from B... set K1 present, set K2 MISSING. (Even after 30 minutes)
- Add K3 to A/B.
- Read all keys from A - returns set K1, K2, K3
- Read all keys from B - returns set K1, K3.
B never syncs set K2... (Its been more than 12 hours) Why does node B not see set K2... anyone has any idea?
Added Info :
Ok... this was the problem. The read_consistency_level was set to 1 by default.
So when we ask node B for set K2, and it doesnt have it (when it is supposed to because of the replication factor = 2), it immediately returns with a 'Not found' error.
However, if we use read consistency to be QUORUM or ALL, then B is forced to ask A, which then returns the correct value and B syncs up that key (saves it locally).
This leads to another problem - This means that when node B comes up, it is not syncing all the data from Node A, even after a long time. Now if node A goes down, how can we access that unsynced data? (I just tested that we cant)
I guess there must be a way to force syncing the data. I see the INFO in the terminal output that a hinted handoff of 15 rows from A to B occured when B came up, but B does not have those rows locally (because we still cant read it from B with consistency level ONE). Whats going on here?
There are 3 ways cassandra syncs updates that happened while a node was down: