How transfer a Table from HBase to Hive?

2019-03-06 09:34发布

问题:

How can I tranfer a HBase table into Hive correctly?

What I tried before can you read in this question How insert overwrite table in hive with diffrent where clauses? ( I made one table to import all data. The problem here is that data is still in rows and not in columns. So I made 3 tables for news, social and all with a specific where clause. After that I made 2 Joins on the tables which is giving me the result table. So I had 6 Tables at all which is not really performant!)

to sum my problem up : In HBase are column familys which are saved as rows like this.

count   verpassen   news    1
count   verpassen   social  0
count   verpassen   all 1

What I want to achieve in Hive is a datastructure like this:

name      news    social   all
verpassen 1       0        1

How am I supposed to do this?

回答1:

Below is the approach use can use.

use hbase storage handler to create the table in hive

example script

CREATE TABLE hbase_table_1(key string, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,f1:val") TBLPROPERTIES ("hbase.table.name" = "test");

I loaded the sample data you have given into hive external table.

select name,collect_set(concat_ws(',',type,val)) input from TESTTABLE group by name ;

i am grouping the data by name.The resultant output for the above query will be

Now i wrote a custom mapper which takes the input as input parameter and emits the values.

from (select '["all,1","social,0","news,1"]' input from TESTTABLE group by name) d MAP d.input Using 'python test.py' as all,social,news

alternatively you can use the output to insert into another table which has column names name,all,social,news

Hope this helps