Cassandra: cannot restrict 2 columns using cluster

2019-07-05 17:41发布

I have a schema pretty similar to this:-

create table x(id int, start_date timestamp, end_date timestamp, 
primary key((id), start_date, end_date)) 
with clustering order by (start_date desc, end_date desc);

Now I am stuck with a problem where I have to query between start date and end date. something like this : -

select count(*) from x where id=2 and start_date > 'date' and end_date < 'date' ;

But it gives me an error similar to the following: -

InvalidRequest: code=2200 [Invalid query] message="PRIMARY KEY column "end_date" 
cannot be restricted (preceding column "start_date" is restricted 
by a non-EQ relation)"

I am new to cassandra, any and all suggestions are welcomed even if it requires us to do a schema change. :)

1条回答
啃猪蹄的小仙女
2楼-- · 2019-07-05 18:32

You don't say which version of Cassandra you are running, but in 2.2 and later you can do multi-column slice restrictions on clustering columns. This can get close to what you want. The syntax in CQL is a little ugly, but basically you have to specify the starting range with all the clustering columns specified, like a compound key. It's important to think about the rows being sorted first by the first column, then within that sorted by the second column.

So assume we have this data:

SELECT * from x;

 id | start_date               | end_date
----+--------------------------+--------------------------
  2 | 2015-09-01 09:16:47+0000 | 2015-11-01 09:16:47+0000
  2 | 2015-08-01 09:16:47+0000 | 2015-10-01 09:16:47+0000
  2 | 2015-07-01 09:16:47+0000 | 2015-09-01 09:16:47+0000
  2 | 2015-06-01 09:16:47+0000 | 2015-10-01 09:16:47+0000

Now let's select based on both dates:

SELECT * from x where id=2 
    and (start_date,end_date) >= ('2015-07-01 09:16:47+0000','2015-07-01 09:16:47+0000') 
    and (start_date,end_date) <= ('2015-09-01 09:16:47+0000','2015-09-01 09:16:47+0000');

 id | start_date               | end_date
----+--------------------------+--------------------------
  2 | 2015-08-01 09:16:47+0000 | 2015-10-01 09:16:47+0000
  2 | 2015-07-01 09:16:47+0000 | 2015-09-01 09:16:47+0000

Now you'll notice that one of those end dates appears to be later than our restriction, but it isn't. Since things are sorted by start_date first, you'll get all the end dates with a matching start_date since they are in the range of the compound range. To get rid of rows like that you'll probably need to do a little filtering on the client side.

See more information here, under "Multi-column slice restrictions".

查看更多
登录 后发表回答