How can I tranfer a HBase table into Hive correctly?
What I tried before can you read in this question
How insert overwrite table in hive with diffrent where clauses?
( I made one table to import all data. The problem here is that data is still in rows and not in columns. So I made 3 tables for news, social and all with a specific where clause. After that I made 2 Joins on the tables which is giving me the result table. So I had 6 Tables at all which is not really performant!)
to sum my problem up : In HBase are column familys which are saved as rows like this.
count verpassen news 1
count verpassen social 0
count verpassen all 1
What I want to achieve in Hive is a datastructure like this:
name news social all
verpassen 1 0 1
How am I supposed to do this?
Below is the approach use can use.
use hbase storage handler to create the table in hive
example script
CREATE TABLE hbase_table_1(key string, value string) STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH
SERDEPROPERTIES ("hbase.columns.mapping" = ":key,f1:val")
TBLPROPERTIES ("hbase.table.name" = "test");
I loaded the sample data you have given into hive external table.
select name,collect_set(concat_ws(',',type,val)) input from TESTTABLE
group by name ;
i am grouping the data by name.The resultant output for the above query will be
Now i wrote a custom mapper which takes the input as input parameter and emits the values.
from (select '["all,1","social,0","news,1"]' input from TESTTABLE group by name) d MAP d.input Using 'python test.py' as
all,social,news
alternatively you can use the output to insert into another table which has column names name,all,social,news
Hope this helps