I have cassandra 2.1.8 cluster with 16 nodes (Centos 6.6, 1x4core xeon, 32Gb RAM, 3x3Tb HDD, java 1.8.0_65) and trying to add 16 more, one by one, but stuck with the first one.
After starting cassandra process on the new node, 16 streams from previously existing nodes to newly added node are starting:
nodetool netstats |grep Already
Receiving 131 files, 241797656689 bytes total. Already received 100 files, 30419228367 bytes total
Receiving 150 files, 227954962242 bytes total. Already received 116 files, 29078363255 bytes total
Receiving 127 files, 239902942980 bytes total. Already received 103 files, 29680298986 bytes total
...
new node is in 'joining' state (last line):
UN ...70 669.64 GB 256 ? a9c8adae-e54e-4e8e-a333-eb9b2b52bfed R0
UN ...71 638.09 GB 256 ? 6aa8cf0c-069a-4049-824a-8359d1c58e59 R0
UN ...80 667.07 GB 256 ? 7abb5609-7dca-465a-a68c-972e54469ad6 R1
UJ ...81 102.99 GB 256 ? c20e431e-7113-489f-b2c3-559bbd9916e2 R2
During few hours process of joining looks normal, but after that the cassandra process on new node is dying with oom exception:
ERROR 09:07:37 Exception in thread Thread[Thread-1822,5,main]
java.lang.OutOfMemoryError: Java heap space
at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:167) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.8.jar:2.1.8]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_65]
java.lang.OutOfMemoryError: Java heap space
at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:167)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: Java heap space
at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:167)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: Java heap space
at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:167)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.lang.Thread.run(Thread.java:745)
I've made 6 or 7 attempts, with both CMS and G1 GC, MAX_HEAP_SIZE from 8G (default) up to 16G, with no luck. It seems cassandra catch oom due to out on heap in defferent places:
RROR [CompactionExecutor:6] 2015-11-08 04:42:24,277 CassandraDaemon.java:223 - Exception in thread Thread[CompactionExecutor:6,1,main]
java.lang.OutOfMemoryError: Java heap space
at org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:75) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.<init>(CompressedRandomAccessReader.java:70) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:48) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createPooledReader(CompressedPoolingSegmentedFile.java:95) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1822) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.columniterator.IndexedSliceReader.setToRowStart(IndexedSliceReader.java:107) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.columniterator.IndexedSliceReader.<init>(IndexedSliceReader.java:83) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:65) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:42) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:246) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:270) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1967) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1810) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:357) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(SliceQueryPager.java:90) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(AbstractQueryPager.java:85) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(SliceQueryPager.java:38) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:155) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:144) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.Keyspace.indexRow(Keyspace.java:427) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.compaction.CompactionManager$10.run(CompactionManager.java:1144) ~[apache-cassandra-2.1.8.jar:2.1.8]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_65]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_65]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_65]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_65]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
Futher expanding of MAX_HEAP_SIZE leads to death of cassandra from system oom-killer.
Any ideas?