I am trying to load de-serialized json events into different tables, based on the name of the event.
Right now I have all the events in the same table, the table has only two columns EventName and Payload (the payload stores the json representation of the event):
CREATE TABLE event( EventName STRING, Payload STRING)
So basically what i want is to load the data in the following table:
CREATE TABLE TempEvent ( Column1 STRING, Column2 STRING, Column3 STRING )
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE;
And load the events with something like:
INSERT INTO TempEvent select Payload from event where EventName='TempEvent';
But hive is throwing an exception saying that the destination table has 3 columns, and the select statement just 1.
Is there other way to acomplish this or i am doing something wrong?.
The JSON serde requires a table with one JSON per line in order to use it. So it won't work with your input table because the line
is not a valid JSON. So first you need to move it into a new intermediate table which just contains valid JSON, and then populate the JSON serde table from there using
load data
:Then you can extract the columns like this:
Of course all of this processing would not be necessary if your source table was valid JSON originally, you could just create the TempEvent table as as an
external table
and pull data directly from it.