HBase and Hadoop

2019-08-15 05:40发布

HBase requires Hadoop installation based on what I read so far. And it looks like HBase can be set up to use existing Hadoop cluster (which is shared with some other users) or it can be set up to use dedicated Hadoop cluster? I guess the latter would be a safer configuration but I am wondering if anybody has any experience on the former (but then I am not very sure my understanding of HBase setup is correct or not).

标签: hadoop hbase
3条回答
甜甜的少女心
2楼-- · 2019-08-15 05:55

In a distributed mode Hadoop is used for its HDFS storage. HBase will store HFile on HDFS, and thus get benefits from replication strategies and data-locality principles brought by datanodes.

RegionServer are about to basically handle local data, but still might have to fetch data from other datanodes.

Hope that will help you to understand why and how hadoop is used with HBase.

查看更多
女痞
3楼-- · 2019-08-15 06:04

We've set it up with an existing Hadoop cluster that's 1,000 cores strong. Short answer: it works just fine, at least with Cloudera CH2 +149.88. But by Hadoop version, your mileage may vary.

查看更多
太酷不给撩
4楼-- · 2019-08-15 06:05

I know that Facebook and other large organizations separate their HBase cluster (real time access) from their Hadoop cluster (batch analytics) for performance reasons. Large MapReduce jobs on the cluster have the ability to impact performance of the real-time interface, which can be problematic.

In a smaller organization or in a situation in which your HBase response time doesn't necessarily need to be consistent, you can just use the same cluster.

There aren't many (or any) concerns with coexistence other than performance concerns.

查看更多
登录 后发表回答