It's been asked and answered for SQL (Convert multiple rows into one with comma as separator), would any of the approaches mentioned work in Hive, e.g. to go from this:
+------+------+
| Col1 | Col2 |
+------+------+
| a | 1 |
| a | 5 |
| a | 6 |
| b | 2 |
| b | 6 |
+------+------+
to this:
+------+-------+
| Col1 | Col2 |
+------+-------+
| a | 1,5,6 |
| b | 2,6 |
+------+-------+
And there is
collect_list
that will take full list (with duplicates).Try this
apache.org documentation
The aggregator function
collect_set
can achieve what you are trying to get. Here is the documentation. So you can write a query like:However, there is one striking difference between MySQL's
GROUP BY
and Hive'scollect_set
that whileGROUP_CONCAT
also retains duplicates in the resulting array,collect_set
removes the duplicates occuring in the array. In the example shown by you there are no repeating group values forCol2
so you can go ahead and use it.