How much data per node in Cassandra cluster?

Where are the boundaries of SSTables compaction (major and minor) and when it becomes ineffective?

If I have major compaction couple of 500G SSTables and my final SSTable will be over 1TB - will this be effective for one node to "rewrite" this big dataset?

This can take about day for HDD and need double size space, so are there best practices for this?

标签： nosql cassandra

1条回答

Emotional °昔

2楼-- · 2020-02-14 03:10

1 TB is a reasonable limit on how much data a single node can handle, but in reality, a node is not at all limited by the size of the data, only the rate of operations.

A node might have only 80 GB of data on it, but if you absolutely pound it with random reads and it doesn't have a lot of RAM, it might not even be able to handle that number of requests at a reasonable rate. Similarly, a node might have 10 TB of data, but if you rarely read from it, or you have a small portion of your data that is hot (so that it can be effectively cached), it will do just fine.

Compaction certainly is an issue to be aware of when you have a large amount of data on one node, but there are a few things to keep in mind:

First, the "biggest" compactions, ones where the result is a single huge SSTable, happen rarely, even more so as the amount of data on your node increases. (The number of minor compactions that must occur before a top-level compaction occurs grows exponentially by the number of top-level compactions you've already performed.)

Second, your node will still be able to handle requests, reads will just be slower.

Third, if your replication factor is above 1 and you aren't reading at consistency level ALL, other replicas will be able to respond quickly to read requests, so you shouldn't see a large difference in latency from a client perspective.

Last, there are plans to improve the compaction strategy that may help with some larger data sets.

0人赞添加讨论(0) 举报

How much data per node in Cassandra cluster?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间