可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

My data does not need to be loaded in realtime so I don't have to use HBASE, but I was wondering if there are any performance benefits of using HBASE in MR Jobs, shouldn't the joins be faster due to the indexed data?

Anybody have any benchmarks?

回答1:

Generally speaking, hive/hdfs will be significantly faster than HBase. HBase sits on top of HDFS so it adds another layer. HBase would be faster if you are looking up individual records but you wouldn't use an MR job for that.

回答2:

Performance of HBase vs. Hive:

Based on the results of HBase, Hive, and Hive on Hbase: it appears that the performance between either approach is comparable.

Hive on HBase Performance

回答3:

Respectfully :) I want to tell you that if your data is not real and you are also thinking for mapreduce jobs then only go hive over hdfs as Weblogs can be processed by the Hadoop MapReduce program and stored in HDFS. Meanwhile, Hive supports fast reading of the data in the HDFS location, basic SQL, joins, and batch data load to the Hive database.
As hive also provide us
Bulk processing/ real time(if possible)
as well as SQL like interface
Built in optimized map-reduce
Partitioning of large data which is more compatible with hdfs and help to reduce the layer of HBase otherwise if you add HBase here then it would be redundant features for you :)