I am quite new to Cassandra, I just learned it with Datastax courses, but I don't find enough information on bucket here or on the Internet and in my application I need to use buckets to split my data.
I have some instruments that will make measures, quite a lot, and splitting the measures daily (timestamp as partition key) might be a bit risky as we can easily reach the limit of 100MB for a partition. Each measure concerns a specific object identified with an ID. So I would like to use a bucket, but I don't know how to do.
I'm using Cassandra 3.7
Here is how my table will look like, roughly:
CREATE TABLE measures (
instrument_id bigint,
day timestamp,
bucket int,
measure_timestamp timestamp,
measure_id uuid,
measure_info float,
object_id bigint,
PRIMARY KEY ((instrument_id, day, bucket), measure_timestamp, measure_id)
);
I thought of adding the object_id as a partition key, but then I loose the "flow of measures" made by an instrument, as what interests me is seeing all the measures made by an instrument in a specific day or period of time.
- So the question is, when I want to request all the records of a day for a specific instrument, how can I do if there is many buckets?
- If I want the partition limit to be 400 000 rows, how can I know when inserting data in which bucket I have to insert the data?
- Is there a way of knowing the number of buckets there is?
Thank you very much for your help!