hive-hbase integration throws classnotfoundexcepti

2019-08-15 01:59发布

问题:

Following with this link https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-HiveMAPtoHBaseColumnFamily

I'm trying to integrate hive and hbase, I have this configuration in hive-site.xml:

<property>
  <name>hive.aux.jars.path</name>
  <value>
    file:///$HIVE_HOME/lib/hive-hbase-handler-2.0.0.jar,
    file:///$HIVE_HOME/lib/hive-ant-2.0.0.jar,
    file:///$HIVE_HOME/lib/protobuf-java-2.5.0.jar,
    file:///$HIVE_HOME/lib/hbase-client-1.1.1.jar,
    file:///$HIVE_HOME/lib/hbase-common-1.1.1.jar,
    file:///$HIVE_HOME/lib/zookeeper-3.4.6.jar,
    file:///$HIVE_HOME/lib/guava-14.0.1.jar
  </value>
</property>

Then create a table named 'ts:testTable' in hbase:

hbase> create 'ts:testTable','pokes'
hbase> put 'ts:testTable', '10000', 'pokes:value','val_10000'
hbase> put 'ts:testTable', '10001', 'pokes:value','val_10001'
...

hbase> scan  'ts:testTable'
ROW                       COLUMN+CELL
 10000                    column=pokes:value, timestamp=1462782972084, value=val_10000
 10001                    column=pokes:value, timestamp=1462783514212, value=val_10001
....

And then create external table in hive:

Hive> CREATE EXTERNAL TABLE hbase_test_table(key int, value string )
       STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
       WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, pokes:value")
       TBLPROPERTIES ("hbase.table.name" = "ts:testTable",
       "hbase.mapred.output.outputtable" = "ts:testTable");

So far so good. But when I tried to select data from the test table, exception was thrown:

Hive> select * from hbase_test_table;
FAILED: RuntimeException java.lang.ClassNotFoundException: NULL::character varying
Error: Error while compiling statement: FAILED: RuntimeException java.lang.ClassNotFoundException: NULL::character varying (state=42000,code=40000)

Am I missing anything?

I'm trying Hive 2.0.0 around with HBase 1.2.1

回答1:

Ok, I figured it out, the "NULL::character varying" is not a part of hive, it is coming from Postgresql, as I'm using it as the back end of Metastore. But the problem is Hive doesn't recognizes this exception from Postgresql. We have the following code for Hive 2.0.0:

300: if (inputFormatClass == null) {
301:   try {
302:     String className = tTable.getSd().getInputFormat();
303:     if (className == null) {
304:       if (getStorageHandler() == null) {
305:         return null;
306:       }
307:      inputFormatClass = getStorageHandler().getInputFormatClass();
308:  } else {
309:  inputFormatClass = (Class<? extends InputFormat>)
310:    Class.forName(className, true, Utilities.getSessionSpecifiedClassLoader());
    }

Line 302 will not return null which supposed to. so that the line 310 will try to load a non-existing class in. That's the reason why program failed.

I believe it is a compatible bug, the way to fix it is change the database which I hate to. So I just simply replaced 302 with

if (className == null || className.toLowerCase().startsWith("null::")) {

And do same thing to the getOutputFormat() method, then re-compile the jar, That's it.