Pivoting in Pig

2020-01-31 11:51发布

This is related to the question in Pivot table with Apache Pig. I have the input data as

Id    Name     Value 
1     Column1  Row11 
1     Column2  Row12 
1     Column3  Row13 
2     Column1  Row21 
2     Column2  Row22 
2     Column3  Row23

and want to pivot and get the output as

Id    Column1 Column2 Column3 
1      Row11    Row12   Row13 
2      Row21    Row22   Row23

Pls let me know how to do it in Pig.

标签： pivot apache-pig

2条回答

爷、活的狠高调

2楼-- · 2020-01-31 12:22

The simplest way to do it without UDF is to group on Id and than in nested foreach select rows for each of the column names, then join them in the generate. See script:

inpt = load '~/rows_to_cols.txt' as (Id : chararray, Name : chararray, Value: chararray);
grp = group inpt by Id;
maps = foreach grp {
    col1 = filter inpt by Name == 'Column1';
    col2 = filter inpt by Name == 'Column2';
    col3 = filter inpt by Name == 'Column3';
    generate flatten(group) as Id, flatten(col1.Value) as Column1, flatten(col2.Value)  as Column2, flatten(col3.Value)  as Column3;
};

Output:

(1,Row11,Row12,Row13)
(2,Row21,Row22,Row23)

Another option would be to write a UDF which converts a bag{name, value} into a map[], than use get values by using column names as keys (Ex. vals#'Column1').

0人赞添加讨论(0) 举报

爷的心禁止访问

3楼-- · 2020-01-31 12:40

Not sure about pig, but in spark, you could do this with a one-line command

df.groupBy("Id").pivot("Name").agg(first("Value"))

0人赞添加讨论(0) 举报

Pivoting in Pig

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间