Concat single column fields using GROUP BY

2019-04-10 05:01发布

问题:

Is there any way to combine/concat the fields within one column by grouping them. Eg:

col1   col2
1     aa
1     bb
1     cc
2     dd
2     ee

I want to query something like :

select col1, concat(col2) from tableName group by col1;

Output should be :

1    aa,bb,cc
2    dd,ee

Is there any function in hive to do this ?

回答1:

Suppose you have a table test as follows:

select id, val from test order by id, val;     
2   aa
2   bb
1   bb
1   aa

You can use the HIVE function collect_set:

select id, collect_set(val) from test group by id;
1   ["aa","bb"]
2   ["bb","aa"]

But note that collect_set returns a set of objects with duplicate elements eliminated.

You can find more details at the Language Manual Wiki.



回答2:

you can use concat_ws() and collect_list() to achieve this....

Something like

select id , concat_ws(",", collect_list(val)) from test group by id; 

"," is the seperator in the above query.



标签: hadoop hive