cassandra 2.1.8, adding new nodes(s), out of memor

2019-07-13 12:26发布

I have cassandra 2.1.8 cluster with 16 nodes (Centos 6.6, 1x4core xeon, 32Gb RAM, 3x3Tb HDD, java 1.8.0_65) and trying to add 16 more, one by one, but stuck with the first one.

After starting cassandra process on the new node, 16 streams from previously existing nodes to newly added node are starting:

nodetool netstats |grep Already
Receiving 131 files, 241797656689 bytes total. Already received 100 files, 30419228367 bytes total
Receiving 150 files, 227954962242 bytes total. Already received 116 files, 29078363255 bytes total
Receiving 127 files, 239902942980 bytes total. Already received 103 files, 29680298986 bytes total
    ...

new node is in 'joining' state (last line):

UN ...70 669.64 GB 256 ? a9c8adae-e54e-4e8e-a333-eb9b2b52bfed R0      
UN ...71 638.09 GB 256 ? 6aa8cf0c-069a-4049-824a-8359d1c58e59 R0    
UN ...80 667.07 GB 256 ? 7abb5609-7dca-465a-a68c-972e54469ad6 R1 
UJ ...81 102.99 GB 256 ? c20e431e-7113-489f-b2c3-559bbd9916e2 R2

During few hours process of joining looks normal, but after that the cassandra process on new node is dying with oom exception:

ERROR 09:07:37 Exception in thread Thread[Thread-1822,5,main]
java.lang.OutOfMemoryError: Java heap space
        at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:167) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_65]
java.lang.OutOfMemoryError: Java heap space
        at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:167)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: Java heap space
        at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:167)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: Java heap space
        at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:167)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at java.lang.Thread.run(Thread.java:745)

I've made 6 or 7 attempts, with both CMS and G1 GC, MAX_HEAP_SIZE from 8G (default) up to 16G, with no luck. It seems cassandra catch oom due to out on heap in defferent places:

RROR [CompactionExecutor:6] 2015-11-08 04:42:24,277 CassandraDaemon.java:223 - Exception in thread Thread[CompactionExecutor:6,1,main]
java.lang.OutOfMemoryError: Java heap space
        at org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:75) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.io.compress.CompressedRandomAccessReader.<init>(CompressedRandomAccessReader.java:70) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:48) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createPooledReader(CompressedPoolingSegmentedFile.java:95) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1822) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.columniterator.IndexedSliceReader.setToRowStart(IndexedSliceReader.java:107) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.columniterator.IndexedSliceReader.<init>(IndexedSliceReader.java:83) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:65) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:42) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:246) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:270) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1967) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1810) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:357) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(SliceQueryPager.java:90) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(AbstractQueryPager.java:85) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(SliceQueryPager.java:38) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:155) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:144) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.Keyspace.indexRow(Keyspace.java:427) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.compaction.CompactionManager$10.run(CompactionManager.java:1144) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_65]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_65]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_65]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_65]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]

Futher expanding of MAX_HEAP_SIZE leads to death of cassandra from system oom-killer.

Any ideas?

1条回答
叛逆
2楼-- · 2019-07-13 12:46

I ran into exactly the same issue (see my JIRA ticket), and it appears to have been related to a table that had lots of tombstones (size-tiered compaction often doesn't do a good job cleaning them up). One potential triage measure is to simply restart the node with auto_bootstrap set to false, then run nodetool rebuild to finish the process. This will cause the existing data to be preserved while allowing the new node to serve traffic.

But you probably still have an underlying issue causing the OOM. Something very large is being materialized into memory during the streaming session (obviously), and it's likely either:

  1. A very large partition, which can happen unexpectedly. Check cfstats and look at max partition bytes. If this is the case, you need to deal with the root data model problem and clean up that data.

  2. A lot of tombstones. You should see a warning about this in the log.

If you do have one of these issues, you will almost certainly have to address it before you will be able to stream successfully.

查看更多
登录 后发表回答