Cassandra query failure (Tombstones)

2019-06-05 17:53发布

问题:

So this is driving me crazy. i tried querying one of my table in Cassandra and it showed query failure. i tried digging dip in to the reason behind it and found that it was because of tombstone. i changed GC_GRACE_SECONDS to Zero and triggered Compaction using nodetool, And when i queried again it worked fine. however on a subsequent calls query failed again with a same reason. i am using cassandra-nodejs driver. This is my data model.

CREATE TABLE my_table (
    firstname text,
    lastname text,
    email text,
    mobile text,
    date timeuuid,
    value float,
    PRIMARY KEY (firstname, lastname, email, mobile)
) WITH CLUSTERING ORDER BY (lastname ASC, email ASC, mobile ASC);

this is the query i want to perform on that data model.

SELECT firstname, email, toDate(date) as date, mobile, value FROM my_table  WHERE date >= minTimeuuid('2017-03-25 00:00:00+0000') AND date <= minTimeuuid('2017-03-28 23:59:59+0000') ALLOW FILTERING;

the result will have approx 40k rows. this shows that if we delete something it will be marked as tombstone and will get deleted After GC_GRACE_SECONDS setted for given table. If i understand it correctly then.

  1. how come there be tombstone problem when i never delete any row of table?
  2. Is that true a row will be marked as Tombstone if and only if we delete a row?
  3. clearing tombstones and then querying the same works sometimes and sometimes it does not, why is so?.
  4. is it a good idea to increase tombstone_failure_threshold value? (single node cluster application)

I am using cassandra 3.5, with cqlsh version 5.0.1. And the query works fine with terminal, but gives error when we execute using external client (express app using nodejs driver for cassandra). i have a single node cluster app.

EDIT 1

This is the log of my Inserted null value in field (i inserted only name and timestamp);

  activity                                                                                        | timestamp                  | source        | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------
                                                                              Execute CQL3 query | 2017-03-29 10:28:27.342000 | 172.31.34.179 |              0
                   Parsing select * FROM testtomb WHERE name = 'Dhaval45'; [SharedPool-Worker-2] | 2017-03-29 10:28:27.342000 | 172.31.34.179 |             64
                                                       Preparing statement [SharedPool-Worker-2] | 2017-03-29 10:28:27.342000 | 172.31.34.179 |            101
                              Executing single-partition query on testtomb [SharedPool-Worker-3] | 2017-03-29 10:28:27.342000 | 172.31.34.179 |            210
                                              Acquiring sstable references [SharedPool-Worker-3] | 2017-03-29 10:28:27.342000 | 172.31.34.179 |            223
 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2017-03-29 10:28:27.342000 | 172.31.34.179 |            243
                                 Merged data from memtables and 0 sstables [SharedPool-Worker-3] | 2017-03-29 10:28:27.342000 | 172.31.34.179 |            288
                                         Read 2 live and 0 tombstone cells [SharedPool-Worker-3] | 2017-03-29 10:28:27.342001 | 172.31.34.179 |            310
                                 Merged data from memtables and 0 sstables [SharedPool-Worker-3] | 2017-03-29 10:28:27.342001 | 172.31.34.179 |            323
                                                                                Request complete | 2017-03-29 10:28:27.342385 | 172.31.34.179 |            385

And this is the log when i query on filed which i have executed a delete query. Initially user Dhaval15 has firstname 'aaaa' and then i the cell aaa. then again executing select query on same user gave me this log.

       activity                                                                                        | timestamp                  | source        | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------
                                                                              Execute CQL3 query | 2017-03-29 10:35:18.581000 | 172.31.34.179 |              0
                   Parsing select * FROM testtomb WHERE name = 'Dhaval15'; [SharedPool-Worker-1] | 2017-03-29 10:35:18.581000 | 172.31.34.179 |             65
                                                       Preparing statement [SharedPool-Worker-1] | 2017-03-29 10:35:18.581000 | 172.31.34.179 |            113
                              Executing single-partition query on testtomb [SharedPool-Worker-3] | 2017-03-29 10:35:18.581000 | 172.31.34.179 |            223
                                              Acquiring sstable references [SharedPool-Worker-3] | 2017-03-29 10:35:18.581000 | 172.31.34.179 |            235
 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2017-03-29 10:35:18.581000 | 172.31.34.179 |            256
                                 Merged data from memtables and 0 sstables [SharedPool-Worker-3] | 2017-03-29 10:35:18.581001 | 172.31.34.179 |            305
                                         Read 1 live and 1 tombstone cells [SharedPool-Worker-3] | 2017-03-29 10:35:18.581001 | 172.31.34.179 |            338
                                 Merged data from memtables and 0 sstables [SharedPool-Worker-3] | 2017-03-29 10:35:18.581001 | 172.31.34.179 |            351
                                                                                Request complete | 2017-03-29 10:35:18.581430 | 172.31.34.179 |            430

回答1:

In Cassandra tombstone created even if you don't execute delete query, when you insert null value.

Tombstone consume space. When you execute select query cassandra needs to filter out data by tombstone. If huge tombstone generated your select query performance will degrade.

Your query failing because of huge tombstone and ALLOW FILTERING. Don't use ALLOW FILTERING on production. it's very costy. When you execute query without specifying partition key, Cassandra needs to scan all the row of all the nodes.

Change your data model to like the below one :

CREATE TABLE my_table (
    year int,
    month int,
    date timeuuid,
    email text,
    firstname text,
    lastname text,
    mobile text,
    value float,
    PRIMARY KEY ((year, month), date)
);

Here you can specify year and month extract from the date.
Now you can query with specifying partition key :

SELECT * FROM my_table WHERE year = 2017 AND month = 03 AND date >= minTimeuuid('2017-03-25 00:00:00+0000') AND date <= minTimeuuid('2017-03-28 23:59:59+0000') ;

This will return result very efficiently and will not fail.

If you need to query with firstname and lastname create an index on them

CREATE INDEX index_firstname ON my_table (firstname) ;
CREATE INDEX index_lastname ON my_table (lastname) ;

Then you can query with firstname or last name

SELECT * FROM my_table WHERE firstname = 'ashraful' ;
SELECT * FROM my_table WHERE lastname  = 'islam' ;

Here i have not create index on email and phone because of high cardinality problem. Instead create materialized view or another table to query with phone or email

CREATE MATERIALIZED VIEW mview_mobile AS
    SELECT *
    FROM my_table
    WHERE mobile IS NOT NULL AND year IS NOT NULL AND month IS NOT NULL AND date IS NOT NULL
    PRIMARY KEY (mobile, year, month, date);


CREATE MATERIALIZED VIEW mview_email AS
        SELECT *
        FROM my_table
        WHERE email IS NOT NULL AND year IS NOT NULL AND month IS NOT NULL AND date IS NOT NULL
        PRIMARY KEY (email, year, month, date);

Now you can query with phone or email

SELECT * FROM mview_mobile WHERE mobile = '018..';
SELECT * FROM mview_email WHERE email = 'ashraful@...';

More about cassandra tombstone : http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html



回答2:

  1. how come there be tombstone problem when i never delete any row of table?

@Ashraful Islam's answer is correct.

In addition, If you explicitly insert data with NULL values, it will internally create tombstones.

Ex: insert into my_table (firstname,lastname,email,mobile, .....) values ('abd', 'gef', 'xxx@abc.com', '+67899...', null, null, .....);

There will be null values for other columns (internally generating tombstones as Cassandra has to indicate with something that those columns do not exists or have values).

As you have mentioned too many columns as primary keys it will create too large wide rows (which is also discouraged) with too many null values thus end up with lots of tombstones. It may have crossed the threshold limit in you case.

As @Ashraful stated this data model is not okay. As you need to query by time you should design your model such that you can do some range queries on time. Query without mentioning partition key thus using ALLOW FILTERING in a large data set is an anti-pattern in Cassandra.

  1. Is that true a row will be marked as Tombstone if and only if we delete a row?

You can delete a complete or a specific column. If a row is deleted, the entire row is marked as tombstone. If a specific column is deleted (or insert/UPDATE NULL value) tombstones will be created on that specific column (which is also discouraged) I guess that happens in your case.

  1. clearing tombstones and then querying the same works sometimes and sometimes it does not, why is so?.

I think you may now determine why this happens :).

  1. is it a good idea to increase tombstone_failure_threshold value? (single node cluster application)

Increasing tombstone_failure_threshold will just lessen the amount of error but does not solve the actual problem. It will not increase performance and cross the threshold as data set increases.

And importantly, in single node cluster application you can set GC_GRACE_SECONDS to 0. Tombstones will be deleted instantly after compaction occurs. GC_GRACE_SECONDS is vital while using multi-node cluster which is the actual purpose of using NoSQL I guess.