Column to comma separated value in Hive

2020-06-03 03:18发布

It's been asked and answered for SQL (Convert multiple rows into one with comma as separator), would any of the approaches mentioned work in Hive, e.g. to go from this:

+------+------+
| Col1 | Col2 |
+------+------+
| a    | 1    |
| a    | 5    |
| a    | 6    |
| b    | 2    |
| b    | 6    |
+------+------+

to this:

+------+-------+
| Col1 | Col2  |
+------+-------+
| a    | 1,5,6 |
| b    | 2,6   |
+------+-------+

标签: hadoop hive
3条回答
家丑人穷心不美
2楼-- · 2020-06-03 04:08

And there is collect_list that will take full list (with duplicates).

查看更多
时光不老,我们不散
3楼-- · 2020-06-03 04:16

Try this

SELECT Col1, concat_ws(',', collect_set(Col2)) as col2
FROM your_table
GROUP BY Col1;

apache.org documentation

查看更多
Bombasti
4楼-- · 2020-06-03 04:18

The aggregator function collect_set can achieve what you are trying to get. Here is the documentation. So you can write a query like:

SELECT Col1, collect_set(Col2)
FROM your_table
GROUP BY Col1;

However, there is one striking difference between MySQL's GROUP BY and Hive's collect_set that while GROUP_CONCAT also retains duplicates in the resulting array, collect_set removes the duplicates occuring in the array. In the example shown by you there are no repeating group values for Col2 so you can go ahead and use it.

查看更多
登录 后发表回答