I have a cassandra Column Family, or CQL table with the following schema:
CREATE TABLE user_actions (
company_id varchar,
employee_id varchar,
inserted_at timeuuid,
action_type varchar,
PRIMARY KEY ((company_id, employee_id), inserted_at)
) WITH CLUSTERING ORDER BY (inserted_at DESC);
Basically a composite partition key that is made up of a company ID and an employee ID, and a clustering column, representing the insertion time, that is used to order the columns in reverse chronological order (newest actions are at the beginning of the row).
Here's what an insert looks like:
INSERT INTO user_actions (company_id, employee_id, inserted_at, action_type)
VALUES ('acme', 'xyz', now(), 'started_project')
USING TTL 1209600; // two weeks
Nothing special here, except the TTL which is set to expire in two weeks.
The read path is also quite simple - we always want the latest 100 actions, so it looks like this:
SELECT action_type FROM user_actions
WHERE company_id = 'acme' and employee_id = 'xyz'
LIMIT 100;
The issue: I would expect that since we order in reverse chronological order, and the TTL is always the same amount of seconds on insertion - that such a query should not scan through any tombstones - all "dead" columns are at the tail of the row, not the head. But in practice we see many warnings in the log in the following format:
WARN [ReadStage:60452] 2014-09-08 09:48:51,259 SliceQueryFilter.java (line 225) Read 40 live and 1164 tombstoned cells in profiles.user_actions (see tombstone_warn_threshold). 100 columns was requested, slices=[-], delInfo={deletedAt=1410169639669000, localDeletion=1410169639}
and on rare occasions the tombstone number is large enough to abort the query completely. Since I see this type of schema design being advocated quite often, I wonder if I'm doing something wrong here?