How to pivot data in Hive with aggregation

2019-07-23 20:05发布

I have a table data like below and I want to pivot the data with aggregation .

ColumnA    ColumnB            ColumnC
1          complete            Yes
1          complete            Yes
2          In progress         No
2          In progress         No 
3          Not yet started     initiate 
3          Not yet started     initiate 

Want to Pivot like below

ColumnA          Complete    In progress     Not yet started
1                 2               0                0
2                 0               2                0
3                 0               0                2

Is there anyway that we can achieve this in hive or Impala?

2条回答
等我变得足够好
2楼-- · 2019-07-23 20:19

This is how you can do this in spark scala.

     val conf = spark.sparkContext.hadoopConfiguration
        val test = spark.sparkContext.parallelize(List(  ("1", "Complete", "yes"),
                                        ("1", "Complete", "yes"),
                                        ("2", "Inprogress", "no"),
                                        ("2", "Inprogress", "no"),
                                       ("3", "Not yet started", "initiate"),
                                        ("3", "Not yet started", "initiate"))


                                        ).toDF("ColumnA","ColumnB","ColumnC")
      test.show()
       val test_pivot = test.groupBy("ColumnA")
                           .pivot("ColumnB")
                           .agg(count("columnC"))

  test_pivot.na.fill(0)show(false)


       }

and the output

|ColumnA|Complete|Inprogress|Not yet started|
+-------+--------+----------+---------------+
|3      |0       |0         |2              |
|1      |2       |0         |0              |
|2      |0       |2         |0              |
+-------+--------+----------+---------------+
查看更多
做自己的国王
3楼-- · 2019-07-23 20:24

Use case with sum aggregation:

select ColumnA,    
       sum(case when ColumnB='complete'        then 1 else 0 end) as Complete,
       sum(case when ColumnB='In progress'     then 1 else 0 end) as In_progress,
       sum(case when ColumnB='Not yet started' then 1 else 0 end) as Not_yet_started
  from table
 group by ColumnA
 order by ColumnA --remove if order is not necessary
;
查看更多
登录 后发表回答