Hive - Split delimited columns over multiple rows,

2019-05-11 02:04发布

问题:

I'm Looking for a way to split the column based on comma delimited data. Below is my dataset

id  col1  col2
1   5,6   7,8

I want to get the result

id col1 col2
1  5    7
1  6    8

The position of the index should match because I need to fetch results accordingly.

I tried the below query but it returns the cartesian product.

Query:

SELECT col3, col4
FROM test ext 
lateral VIEW explode(split(col1,'\002')) col1 AS col3
lateral VIEW explode(split(col2,'\002')) col2 AS col4

Result:

id col1 col2
1  5    7
1  5    8
1  6    7
1  6    8

回答1:

You can use posexplode() to create position index columns for your split arrays. Then, select only those rows where the position indices are equal.

SELECT id, col3, col4
  FROM test
  lateral VIEW posexplode(split(col1,'\002')) col1 AS pos3, col3
  lateral VIEW posexplode(split(col2,'\002')) col2 AS pos4, col4
  WHERE pos3 = pos4;

Output:

id col3 col4
1  5    7
1  6    8

Reference: Hive language manual - posexplode()