I have the following schema dataset which i want to transform into a table that can be exported to SQL. I am using HIVE
. Input as follows
call_id,stat1,stat2,stat3
1,a,b,c,
2,x,y,z,
3,d,e,f,
1,j,k,l,
The output table needs to have call_id
as its primary key so it needs to be unique. The output schema should be
call_id,stat2,stat3,
1,b,c, or (1,k,l)
2,y,z,
3,e,f,
The problem is that when i use the keyword DISTINCT
in the HIVE
query, the DISTINCT
applies to the all the colums combined. I want to apply the DISTINCT operation only to the call_id
. Something on the lines of
SELECT DISTINCT(call_id), stat2,stat3 from intable;
However this is not valid in HIVE
(I am not well-versed in SQL either).
The only legal query seems to be
SELECT DISTINCT call_id, stat2,stat3 from intable;
But this returns multiple rows with same call_id
as the other columns are different and the row on the whole is distinct.
NOTE: There is no arithmetic relation between a,b,c,x,y,z, etc. So any trick of averaging or summing is not viable.
Any ideas how i can do this?