I wonder if it's possible to pivot a table in one pass in Apache Pig.
Input:
Id Column1 Column2 Column3
1 Row11 Row12 Row13
2 Row21 Row22 Row23
Output:
Id Name Value
1 Column1 Row11
1 Column2 Row12
1 Column3 Row13
2 Column1 Row21
2 Column2 Row22
2 Column3 Row23
The real data has dozens of columns.
I can do that with awk in one pass then run it with Hadoop Streaming. But majority of my code is is Apache Pig so I wonder if it's possible to do it in Pig efficiently.
You can do it in 2 ways: 1. Write a UDF which returns a bag of tuples. It will be the most flexible solution, but requires Java code; 2. Write a rigid script like this:
Running this script got me following results:
I removed col3 from id 1 to show how to handle optional (NULL) data
Id Name Value 1 Column1 Row11 1 Column2 Row12 2 Column1 Row21 2 Column2 Row22 2 Column3 Row23
--pigscript.pig
Results:
(1,Row11,Row12,NULL)
(2,Row21,Row22,Row23)