Hbase column family

2019-03-26 14:21发布

问题:

Hbase documentation says that avoid creating more than 2-3 column families because Hbase does not handle more than 2-3 column families very well. The reason for this is compaction and flushing and hence the IO. However, if all my columns are always populated (for every row) then I think this reasoning is not that important, so, considering that my access to columns is completely random (I want to access any combination of columns) - can I have one column family -one column configuration (effectively trying to make it pure columnar).

There are many blogs/wikis explaining this but they all seem to contradict and add more confusion. I just don't seem to be able to digest the fact that Hbase prefers one column family, then what's the point of calling is a column store?

回答1:

Currently (though this is expected to change), all of the column families for a region are flushed together. This is the primary reason why people say "HBase doesn't do well with more than 2 or 3 column families". Consider two CF's, each with one column. Column A:A stores whole web page texts. Column B:B stores the number of words in the page. So every time we flush A:A (which will happen more often because A:A's data is far bigger), we also need to go through a whole separate file I/O juggling routing for column B:B, even though there is no need to- with B:B only holding numbers, I could go for months without flushing it.

If you store A and B in the same column family (A:A and A:B), you will probably see vastly better flush I/O performance, and because most HBase reads are purely from the memstore, you will probably find that read speeds are equivalent.

Also, and perhaps more importantly, if the cardinality of the columns is wildly different, then your regionservers will need to maintain useless mostly-empty files for your less-dense column families. This will never change.

All of this is available in the HBase Book.

So, as in all such performance situations, measure before deciding what the "correct" path is.



标签: hbase