From the table schema below, I am trying to select all pH readings that are below 5.
I have followed these three pieces of advice:
- Use ALLOW FILTERING
- Include an equality comparison
- Create a secondary index on the reading_value column.
Here is my query:
select * from todmorden_numeric where sensor_name = 'pHradio' and reading_value < 5 allow filtering;
Which is rejected with this message:
Bad Request: No indexed columns present in by-columns clause with Equal operator
I tried adding a secondary index to the sensor_name column and was told that it was already part of the key and therefore already indexed.
I created the index after the table had been in use for a while - could that be the problem? I ran "nodetool refresh" in the hope it would make the index available but this did not work. Here is the output of describe table todmorden_numeric
:
CREATE TABLE todmorden_numeric (
sensor_name text,
reading_time timestamp,
reading_value float,
PRIMARY KEY ((sensor_name), reading_time)
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='Data that suits being stored as floats' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
CREATE INDEX todmorden_numeric_reading_value_idx ON todmorden_numeric (reading_value);
Cassandra allows range search only on:
a) Partition Key only if ByteOrderPartitioner is used (default now is murmur3).
b) any single clustering key ONLY IF any clustering keys defined BEFORE the target column in the primary key definition are already specified by an = operator in the predicate.
They don't work on secondary indices.
Consider the following table definition:
You CAN'T do a range on the val in this case.
Consider this one:
Then the following is valid:
Kinda pointless, but this is not valid:
It's not valid as you haven't filtered by a previous clustering key in the primary key def (in this case, time).
For your query, you may want to do this:
Note the order of columns in the primary key: val's before time.
This will allow you to do:
On a different note, how long do you intend to hold data? How frequently do you get readings? This can cause your partition to grow quite large quite quickly. You may want to bucket it readings into multiple partitions (manual sharding). Perhaps one partition per day? Of course, such things would greatly depend on your access patterns.
Hope that helps.