Cassandra good for write and less read , HBASE ran

2019-07-31 10:23发布

Is it right that Cassandra is good for write and less read, whereas HBASE is good for random read and write? Heard that facebook replaces Cassandra with HBASE

2条回答
forever°为你锁心
2楼-- · 2019-07-31 11:15

HBase uses a LSM tree and provides a standard insert/write rate. Random doesn't comes into picture for regular writes due to LSM trees. If you are doing bulk upload, then the Write ahead logs (WAL) can be bypassed and directly hit the in-memory store. If you want you can use a hadoop or other data tools to write directly to HDFS for huge bulk uploads. You can improve write performance if you increase the num of region servers as this will lead to more WALs. But, as usual it will bite you somewhere else. So, be careful.

As for random read, HBase will be able to provide you a better performance if your block sizes are small enough. It will easily locate the block that contains your data, then process that block sequentially to get to your data. So, smaller blocks for random and bigger blocks for sequential read. Smaller blocks slightly affect the space constraints as the index block size increases.

Still learning Cassandra. so no comments abt it.

查看更多
贼婆χ
3楼-- · 2019-07-31 11:16

Yes: fb started building Cassandra, put it OpenSource, and migrated to HBase later on. I'm not exactly sure why but Cassandra and HBase are both good solutions.

Cassandra has benefits being + HA (no SPOF), + having tunable Consistency, and + doing writes faster than reads (both are rather fast) - But Cassandra may increase network traffic as coordinator nodes have to communicate with target nodes. - Cassandra does it's own data storage whereas HBase uses HDFS by default. I strongly assume this was the reason to switch because fb has massive amounts of data and with HBase they analyze it with less overhead -- but with a Single Point of Failure.

HBase excels + when strong Consistency is mandatory and + Hadoop integration - But HMaster is a SPOF

Yes: Cassandra is very fast writing bulk data in sequence and reading them sequentially. HBase is very good at random IO because of HDFS. In performance comparisons Cassandra is in general slightly faster in throughput; HBase is slightly faster at latency. From operations perspective is Cassandra very easy to maintain as it is very reliable and a robust systems architecture. HBase is hard to setup and less robust because of HMaster and the by standing Zookeeper cluster needed.

So in the end it's totally up to your problem. I never heart anybody avoiding Cassandra; so I think HBase was just better.

查看更多
登录 后发表回答