import complex Json data to hive

2019-02-20 19:53发布

A little spoon feeding required, how to import complex json into hive. Json file in the format of:{"some-headers":"", "dump":[{"item-id":"item-1"},{"item-id":"item-2"},...]}. Hive to have fields given under dump. Json file size, as now ,is not exceeding 200MB, but since its a dump, it will reach GBs very soon. Any other possible methods shall be greatly appreciated.

2条回答
萌系小妹纸
2楼-- · 2019-02-20 20:12

posting End-to-End solution. Step by step procedure to convert JSON to hive table:

step 1) install maven if not there already

>$ sudo apt-get install maven

step 2) install git if not there already

>sudo git clone https://github.com/rcongiu/Hive-JSON-Serde.git

step 3) go into the $HOME/HIVE-JSON_Serde folder

step 4) build the serde package

>sudo mvn -Pcdh5 clean package

step 5) The serde file will be in $HOME/Hive-JSON-Serde/json-serde/target/json-serde-1.3.7-SNAPSHOT-jar-with-dependencies.jar

step 6) Add serde as dependency jar in hive

 hive> ADD JAR $HOME/Hive-JSON-Serde/json-serde/target/json-serde-1.3.7- SNAPSHOT-jar-with-dependencies.jar;

step 7) create json file in $HOME/books.json (Example)

{"value": [{"id": "1","bookname": "A","properties": {"subscription": "1year","unit": "3"}},{"id": "2","bookname":"B","properties":{"subscription": "2years","unit": "5"}}]}

step 8) create tmp1 table in hive

 hive>CREATE TABLE tmp1 (
      value ARRAY<struct<id:string,bookname:string,properties:struct<subscription:string,unit:string>>>   
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 
    'mapping.value' = 'value'   
) 
STORED AS TEXTFILE;

step 9) load the data from json to tmp1 table

>LOAD DATA LOCAL INPATH '$HOME/books.json' INTO TABLE tmp1;

step 10) create a tmp2 table to do explode operation form tmp1, this intermediate step is to break multi level json structure into multiple rows Note: if your JSON structure is simple and single level , avoid this step

hive>create table tmp2 as 
 SELECT *
 FROM tmp1
 LATERAL VIEW explode(value) itemTable AS items;

step 11) create hive table and load the values from tmp2 table

hive>create table books as 
select value[0].id as id, value[0].bookname as name, value[0].properties.subscription as subscription, value[0].properties.unit as unit from tmp2;

step 12) drop tmp tables

hive>drop table tmp1;
hive>drop table tmp2;

step 13) test the hive table

hive>select * from books;

output:

id name subscription unit

1 B 1year 3

2 B 2years 5

查看更多
一夜七次
3楼-- · 2019-02-20 20:19

You can import JSON into Hive by implementing the HiveSerDe.

This link serves as a sample implementation.

https://github.com/rcongiu/Hive-JSON-Serde

You can also refer to these links

How do you make a HIVE table out of JSON data?

查看更多
登录 后发表回答