How does HBase enable Random Access to HDFS?

2019-03-15 17:21发布

Given that HBase is a database with its files stored in HDFS, how does it enable random access to a singular piece of data within HDFS? By which method is this accomplished?

From the Apache HBase Reference Guide:

HBase internally puts your data in indexed "StoreFiles" that exist on HDFS for high-speed lookups. See the Chapter 5, Data Model and the rest of this chapter for more information on how HBase achieves its goals.

Scanning both chapters didn't reveal a high-level answer for this question.

So how does HBase enable random access to files stored in HDFS?

2条回答
乱世女痞
2楼-- · 2019-03-15 17:57

hbase acess hdfs file by using hfile . you can check the url to get the detail: http://hbase.apache.org/book/hfilev2.html

查看更多
【Aperson】
3楼-- · 2019-03-15 18:20

HBase stores data in HFiles that are indexed (sorted) by their key. Given a random key, the client can determine when region server to ask for the row from. The region server can determine which region to retrieve the row from, and then do a binary search through the region to access the correct row. This is accomplished by having sufficient statistics to know the number of blocks, block size, start key, and end key.

For example: a table may contain 10 TB of data. But, the table is broken up into regions of size 4GB. Each region has a start/end key. The client can get the list of regions for a table and determine which region has the key it is looking for. Regions are broken up into blocks, so that the region server can do a binary search through its blocks. Blocks are essentially long lists of key, attribute, value, version. If you know what the starting key is for each block, you can determine one file to access, and what the byte-offset (block) is to start reading to see where you are in the binary search.

查看更多
登录 后发表回答