I want to insert JSON data from one table to other tables based on the key fields on the data.
my data looks like this
{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"ABC":{"XYZ":"123.dfer","founder":"3.0","GHT":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}
{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"GAP":{"XVY":"123.dfer","FAH":"3.0","GHT":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}
{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"BOX":{"VOG":"123.dfer","FAH":"3.0","FAX":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}
Here based on Version, whther it is "BOX" or "GAP" or "ABC" i want to populate the fields on particular JSON rows to another table
for ex: if version is "GAP" then populate particular rows in one table if it is "BOX" then populate to another tables... I mean all rows for BOX...
how can I achieve this using HIVE. Please help.
NOTE: My JSON data is in one table as a column with type string
Demo
First of all you will need to store your data as json in hive tables:
I assume your hive table is EXTERNAL (usually are - check it with
SHOW CREATE TABLE your_table
).If so the entire dataset lies in some hdfs/s3 path, for example
s3a://your_bucket/your_jsons_location/
Download json-udf-1.3.7-jar-with-dependencies.jar and run
ADD JARS s3a://your_bucket/lib/json-udf-1.3.7-jar-with-dependencies.jar;
Then you must create a dedicated json table for each json schema you have:
Now if you'll run:
you will see that each table parsed correctly only the matching jsons (according to the schema that was specified in the create statment). The unmatching ones in each table are NULL.
To filter out the unrelevant records run:
SELECT * FROM abcs WHERE Rtype.mod.version[0].abc IS NOT NULL;
Note: that this whole explanation assumes that your jsons are stored externally to the hive table (specifically I used S3 but it can also be HDFS)