For below Dataset, to get Total Summary values of Col1 , I did
import org.apache.spark.sql.functions._
val totaldf = df.groupBy("Col1").agg(lit("Total").as("Col2"), sum("price").as("price"), sum("displayPrice").as("displayPrice"))
and then merged with
df.union(totaldf).orderBy(col("Col1"), col("Col2").desc).show(false)
df.
+-----------+-------+--------+--------------+
| Col1 | Col2 | price | displayPrice |
+-----------+-------+--------+--------------+
| Category1 | item1 | 15 | 14 |
| Category1 | item2 | 11 | 10 |
| Category1 | item3 | 18 | 16 |
| Category2 | item1 | 15 | 14 |
| Category2 | item2 | 11 | 10 |
| Category2 | item3 | 18 | 16 |
+-----------+-------+--------+--------------+
After merging.
+-----------+-------+-------+--------------+
| Col1 | Col2 | price | displayPrice |
+-----------+-------+-------+--------------+
| Category1 | Total | 44 | 40 |
| Category1 | item1 | 15 | 14 |
| Category1 | item2 | 11 | 10 |
| Category1 | item3 | 18 | 16 |
| Category2 | Total | 46 | 44 |
| Category2 | item1 | 16 | 15 |
| Category2 | item2 | 11 | 10 |
| Category2 | item3 | 19 | 17 |
+-----------+-------+-------+--------------+
Now I want summary of Whole Dataset as Below , which will have Col1 Summary as Total and has the Data of All Col1 and Col2. Required.
+-----------+-------+-------+--------------+
| Col1 | Col2 | price | displayPrice |
+-----------+-------+-------+--------------+
| Total | Total | 90 | 84 |
| Category1 | Total | 44 | 40 |
| Category1 | item1 | 15 | 14 |
| Category1 | item2 | 11 | 10 |
| Category1 | item3 | 18 | 16 |
| Category2 | Total | 46 | 44 |
| Category2 | item1 | 16 | 15 |
| Category2 | item2 | 11 | 10 |
| Category2 | item3 | 19 | 17 |
+-----------+-------+-------+--------------+
How Can I be able to achieve the above result?