How can I write a CQL 3 DELETE
row specification (WHERE
clause) that will select only rows that are stored on a given node? If that is not possible, is there a SELECT
relation (WHERE
clause) that will indicate which rows are stored on a particular node?
I want to do this so I can have a housekeeping daemon (in Java) running on each data-store node, which deletes old records from that node, so it can ensure that its node does not run out of disk space. As I am writing a daemon, rather than performing a one-off cleanup, it is not appropriate to use the nodetool
program to query for the token ranges stored on a node.
Here's one way that might work (but see below for a better idea). If you don't have vnodes enabled, you could identify the token ranges (from then nodetool ring
command), then use them as part of your delete command. For example:
delete from MyTable where
token(MyPK) >= Token1 and
token(MyPK) < Token2 and
(your delete logic here)
;
However, a much simpler and safer method would be to just let Cassandra figure out where the data is, and just do this from any node:
delete from MyTable where
(your delete logic here)
;
nodetool getendpoints tells you which node owns a partition key: http://www.datastax.com/documentation/cassandra/2.1/cassandra/tools/toolsGetEndPoints.html?scroll=toolsGetEndPoints__toolsGetEndPtEx