I want to transpose following table using spark scala without Pivot function
I am using Spark 1.5.1 and Pivot function does not support in 1.5.1. Please suggest suitable method to transpose following table:
Customer Day Sales
1 Mon 12
1 Tue 10
1 Thu 15
1 Fri 2
2 Sun 10
2 Wed 5
2 Thu 4
2 Fri 3
Output table :
Customer Sun Mon Tue Wed Thu Fri
1 0 12 10 0 15 2
2 10 0 0 5 4 3
Following code is not working as I am using Spark 1.5.1 and pivot function is available from Spark 1.6:
var Trans = Cust_Sales.groupBy("Customer").Pivot("Day").sum("Sales")
If you are working with python below code might help. Let's say you want to transpose spark DataFrame df:
Consider a data frame which has 6 columns and we want to group by first 4 columns and pivot on col5 while aggregating on col6 (say sum on it). So lets say you cannot use the spark 1.6 version then the below code can be written (in spark 1.5) as:
Here is the code with same output but without using in-built pivot function:
Not sure how efficient that is, but you can use
collect
to get all the distinct days, and then add these columns, then usegroupBy
andsum
:Which prints (almost) what you wanted:
I'll leave it to you to rename / reorder the columns if needed.