A little spoon feeding required, how to import complex json into hive. Json file in the format of:{"some-headers":"", "dump":[{"item-id":"item-1"},{"item-id":"item-2"},...]}
.
Hive to have fields given under dump
. Json file size, as now ,is not exceeding 200MB, but since its a dump, it will reach GBs very soon. Any other possible methods shall be greatly appreciated.
相关问题
- Jackson Deserialization not calling deserialize on
- How to maintain order of key-value in DataFrame sa
- StackExchange API - Deserialize Date in JSON Respo
- Easiest way to get json and parse it using JQuery
- Newtonsoft DeserializeXNode expands internal array
posting End-to-End solution. Step by step procedure to convert JSON to hive table:
step 1) install maven if not there already
>$ sudo apt-get install maven
step 2) install git if not there already
>sudo git clone https://github.com/rcongiu/Hive-JSON-Serde.git
step 3) go into the $HOME/HIVE-JSON_Serde folder
step 4) build the serde package
>sudo mvn -Pcdh5 clean package
step 5) The serde file will be in $HOME/Hive-JSON-Serde/json-serde/target/json-serde-1.3.7-SNAPSHOT-jar-with-dependencies.jar
step 6) Add serde as dependency jar in hive
step 7) create json file in $HOME/books.json (Example)
step 8) create tmp1 table in hive
step 9) load the data from json to tmp1 table
step 10) create a tmp2 table to do explode operation form tmp1, this intermediate step is to break multi level json structure into multiple rows Note: if your JSON structure is simple and single level , avoid this step
step 11) create hive table and load the values from tmp2 table
step 12) drop tmp tables
step 13) test the hive table
output:
id name subscription unit
1 B 1year 3
2 B 2years 5
You can import JSON into Hive by implementing the HiveSerDe.
This link serves as a sample implementation.
https://github.com/rcongiu/Hive-JSON-Serde
You can also refer to these links
How do you make a HIVE table out of JSON data?