Strange behavior of timeuuid comparison

2019-09-02 01:16发布

问题:

I have Cassandra 2.x cluster with 3 nodes and the db scheme like this:

cqlsh> CREATE KEYSPACE test_ks WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 3} AND durable_writes = true;
cqlsh> CREATE TABLE IF NOT EXISTS test_ks.test_cf (
   ...   time timeuuid,
   ...   user_id varchar,
   ...   info varchar,
   ...   PRIMARY KEY (time, user_id)
   ... ) WITH compression = {'sstable_compression': 'LZ4Compressor'} AND compaction = {'class': 'LeveledCompactionStrategy'};

Lets add some data (wait some time betweeb inserts):

cqlsh> INSERT INTO test_ks.test_cf (time, user_id, info) VALUES (now(), 'user1', 'pythonista');
cqlsh> INSERT INTO test_ks.test_cf (time, user_id, info) VALUES (now(), 'user1', 'mr. Haskell');

Lets look at our data:

cqlsh> SELECT dateOf(time), user_id, info FROM test_ks.test_cf;

 dateOf(time)             | user_id | info
--------------------------+---------+-------------
 2014-06-24 16:00:31+0700 |   user1 | mr. Haskell
 2014-06-24 15:59:32+0700 |   user1 |  pythonista

(2 rows)

I get some strange results while querying test_cf CF:

cqlsh> SELECT dateOf(time) FROM test_ks.test_cf WHERE user_id='user1' AND token(time) >= token(maxTimeuuid('2014-06-24 16:00:31+0700')) ALLOW FILTERING;

 dateOf(time)
--------------------------
 2014-06-24 15:59:32+0700

(1 rows)

cqlsh> SELECT dateOf(time) FROM test_ks.test_cf WHERE user_id='user1' AND token(time) >= token(maxTimeuuid('2014-06-24 16:00:32+0700')) ALLOW FILTERING;

 dateOf(time)
--------------------------
 2014-06-24 15:59:32+0700

(1 rows)

cqlsh> SELECT dateOf(time) FROM test_ks.test_cf WHERE user_id='user1' AND token(time) >= token(maxTimeuuid('2014-06-24 16:00:33+0700')) ALLOW FILTERING;

 dateOf(time)
--------------------------
 2014-06-24 16:00:31+0700
 2014-06-24 15:59:32+0700

(2 rows)

As you can see comparison gives the wrong results, although, timeuuid must not be greater than maxTimeuuid and must be greater than 'minTimeuuid' (for a same datetime, of course =) ). Could somebody explain to me this strange behavior?

TIA!

回答1:

First off, your primary key is backwards—if you want to perform queries like this, your CREATE TABLE statement should look like this:

CREATE TABLE IF NOT EXISTS test_ks.test_cf (
  time timeuuid,
  user_id varchar,
  info varchar,
  PRIMARY KEY (user_id, time)
) WITH compression = {'sstable_compression': 'LZ4Compressor'}
  AND compaction = {'class': 'LeveledCompactionStrategy'};

That makes user_id the partition key and time the clustering column, which fits your query pattern.

With this change, you no longer need to use the TOKEN function, which is needed to make range queries over partition keys, but is meaningless for clustering columns (and generally does not correspond to semantic ordering of column values). Since time is now a clustering column, you just need to do a normal comparison:

SELECT dateOf(time)
FROM test_ks.test_cf
WHERE user_id='user1'
  AND time >= maxTimeuuid('2014-06-24 16:00:31+0700');

You can also drop the ALLOW FILTERING, since you're now doing a standard range slice rather than the token comparison, which required scanning all the rows and making the comparison.