Cassandra timeout during read query at consistency

2019-02-20 20:48发布

问题:

I have a problem with the cassandra db and hope somebody can help me. I have a table “log”. In the log table, I have inserted about 10000 rows. Everything works fine. I can do a

select * from

select count(*) from

As soon I insert 100000 rows with TTL 50, I receive a error with

select count(*) from

Version: cassandra 2.1.8, 2 nodes

Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)

Has someone a idea what I am doing wrong?

CREATE TABLE test.log (
    day text,
    date timestamp,
    ip text,
    iid int,
    request text,
    src text,
    tid int,
    txt text,
    PRIMARY KEY (day, date, ip)
) WITH read_repair_chance = 0.0
   AND dclocal_read_repair_chance = 0.1
   AND gc_grace_seconds = 864000
   AND bloom_filter_fp_chance = 0.01
   AND caching = { 'keys' : 'ALL', 'rows_per_partition' : 'NONE' }
   AND comment = ''
   AND compaction = { 'class' : 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' }
   AND compression = { 'sstable_compression' : 'org.apache.cassandra.io.compress.LZ4Compressor' }
   AND default_time_to_live = 0
   AND speculative_retry = '99.0PERCENTILE'
   AND min_index_interval = 128
   AND max_index_interval = 2048;

回答1:

That error message indicates a problem with the READ operation. Most likely it is a READ timeout. You may need to update your Cassandra.yaml with a larger read timeout time as described in this SO answer.

Example for 200 seconds:

read_request_timeout_in_ms: 200000

If updating that does not work you may need to tweak the JVM settings for Cassandra. See DataStax's "Tuning Java Ops" for more information



回答2:

count() is a very costly operation, imagine Cassandra need to scan all the row from all the node just to give you the count. In small amount of rows if works, but on bigger data, you should use another approaches to avoid timeout.

  • First of all, we have to retrieve row by row to count amount and forgot about count(*)
  • We should make a several(dozens, hundreds?) queries with filtering by partition and clustering key and summ amount of rows retrieved by each query.
  • Here is good explanation what is clustering and partition keys In your case day - is partition key, composite key consists from two columns: date and ip.
  • It most likely impossible to do it with cqlsh commandline client, so you should write a script by yourself. Official drivers for popular programming languages: http://docs.datastax.com/en/developer/driver-matrix/doc/common/driverMatrix.html

Example of one of such queries:

select day, date, ip, iid, request, src, tid, txt from test.log where day='Saturday' and date='2017-08-12 00:00:00' and ip='127.0 0.1'

Remarks:

  • If you need just to calculate count and nothing more, probably has a sense to google for tool like https://github.com/brianmhess/cassandra-count

  • If Cassandra refuses to run your query without ALLOW FILTERING that mean query is not efficient https://stackoverflow.com/a/38350839/2900229