I'm trying to load an extract of a pig script as an external table in HIVE. Pig enclosed each row between brackets () (tuples?) like this:
(1,2,3,a)
(2,4,5,b)
(4,2,6,c)
and I can't find a way to tell HIVE to ignore those brackets which results in null values for the first column as it is actually an integer.
Any thoughts on how to proceed?
I know I can use a FLATTEN command in PIG but I would also like to learn how to deal with these files directly from HIVE.
There is no way to do this in one step. You'd have to have another step, be it the use of
flatten
in Pig or an extra HiveINSERT INTO
.In Hive you could use
split(string field, string pattern)
several times to read from your external table and create the columns you want and then load that into a new table. However I'd always lean towards having Pig output into the format you want, unless something else is reading from this file that expects the data in that format. It will save an expensive re-read of all your data.As Ben said there is no way to do in one step.. but you can do it by creating one more temp table in hive.
Not sure if I am making it more complicated with one more table.. but it worked for me.
Now lets insert data
I know type casting will hit performance.. but for the given scenario this is the best I could come up with.