MaximumRetryException when reading data off Cassan

2019-08-28 19:46发布

I am inserting time series data with time stamp (T) as the column name in a wide column that stores 24 hours worth of data in a single row. Streaming data is written from data generator (4 instances, each with 256 threads) inserting data into multiple rows in parallel.

CF2 (Wide column family):

RowKey1 (T1, V1) (T2, V3) (T4, V4) ......

RowKey2 (T1, V1) (T3, V3) .....

:

:

I am now attempting to read this data off Cassandra using multiget. The client is written in python and uses pycassa.

When I query 24 hours worth data for a single row key, over 2 million data points are returned. At 4 rowkeys in parallel using multiget, I get the following error:

File "c:\Python27-64bit\lib\site-packages\pycassa-1.9.1-py2.7.egg\pycassa\columnfamily.py", line 772, in multiget packed_keys[offset:offset + buffer_size], cp, sp, consistency)

File "c:\Python27-64bit\lib\site-packages\pycassa-1.9.1-py2.7.egg\pycassa\pool.py", line 576, in execute return getattr(conn, f)(*args, **kwargs)

File "c:\Python27-64bit\lib\site-packages\pycassa-1.9.1-py2.7.egg\pycassa\pool.py", line 153, in new_f return new_f(self, *args, **kwargs)

File "c:\Python27-64bit\lib\site-packages\pycassa-1.9.1-py2.7.egg\pycassa\pool.py", line 153, in new_f return new_f(self, *args, **kwargs)

File "c:\Python27-64bit\lib\site-packages\pycassa-1.9.1-py2.7.egg\pycassa\pool.py", line 153, in new_f return new_f(self, *args, **kwargs) File "c:\Python27-64bit\lib\site-packages\pycassa-1.9.1-py2.7.egg\pycassa\pool.py", line 153, in new_f return new_f(self, *args, **kwargs)

File "c:\Python27-64bit\lib\site-packages\pycassa-1.9.1-py2.7.egg\pycassa\pool.py", line 153, in new_f return new_f(self, *args, **kwargs)

File "c:\Python27-64bit\lib\site-packages\pycassa-1.9.1-py2.7.egg\pycassa\pool.py", line 148, in new_f (self._retry_count, exc.class._name_, exc)) pycassa.pool.MaximumRetryException: Retried 6 times. Last failure was timeout: timed out

However, previously I was able to get data with 256 rowkeys in parallel. Now, I have increased the density of data (i.e no .of data points within a time range) and the queries fail with this issue.

We tweaked the buffer_size in multiget and found 8 to be the sweet spot.

HeapSize: 6GB

RAM: 16GB

Read Consistency: ONE

Cluster: 5 node

The CPU utilization is less than 20%. Also, I do not see any abnormal patterns in Read throughput, read latency, disk throughput & disk latency as reported by OpsCenter.

I also tried increasing the read_request_timeout_in_ms in Cassandra.yaml to 60000 but in vain. Any other pointers as to why we are getting this exception? I would expect the query to take a lot of time to retrieve the results, but nonetheless not fail.

Thanks,

VS

1条回答
混吃等死
2楼-- · 2019-08-28 20:38

(Answer copied from the same question on the pycassa mailing list.)

Last failure was timeout: timed out

This indicates a client-side timeout, so adjusting read_request_timeout_in_ms won't fix this. Instead, adjust the timeout parameter for your ConnectionPool; the default is 0.5 seconds.

For really wide rows, you may also want to experiment with using xget() in parallel across multiple threads. This will distribute the load from your queries a little better and not put such a burden on one coordinator node.

查看更多
登录 后发表回答