I have a table data like below and I want to pivot the data with aggregation .
ColumnA ColumnB ColumnC
1 complete Yes
1 complete Yes
2 In progress No
2 In progress No
3 Not yet started initiate
3 Not yet started initiate
Want to Pivot like below
ColumnA Complete In progress Not yet started
1 2 0 0
2 0 2 0
3 0 0 2
Is there anyway that we can achieve this in hive or Impala?
Use case
with sum
aggregation:
select ColumnA,
sum(case when ColumnB='complete' then 1 else 0 end) as Complete,
sum(case when ColumnB='In progress' then 1 else 0 end) as In_progress,
sum(case when ColumnB='Not yet started' then 1 else 0 end) as Not_yet_started
from table
group by ColumnA
order by ColumnA --remove if order is not necessary
;
This is how you can do this in spark scala.
val conf = spark.sparkContext.hadoopConfiguration
val test = spark.sparkContext.parallelize(List( ("1", "Complete", "yes"),
("1", "Complete", "yes"),
("2", "Inprogress", "no"),
("2", "Inprogress", "no"),
("3", "Not yet started", "initiate"),
("3", "Not yet started", "initiate"))
).toDF("ColumnA","ColumnB","ColumnC")
test.show()
val test_pivot = test.groupBy("ColumnA")
.pivot("ColumnB")
.agg(count("columnC"))
test_pivot.na.fill(0)show(false)
}
and the output
|ColumnA|Complete|Inprogress|Not yet started|
+-------+--------+----------+---------------+
|3 |0 |0 |2 |
|1 |2 |0 |0 |
|2 |0 |2 |0 |
+-------+--------+----------+---------------+