Select row with MAX value per category Power BI

2019-07-02 03:29发布

问题:

How to select row with max value per category in M of Power BI. Suppose we have table:

+----------+-------+------------+
| Category | Value |    Date    |
+----------+-------+------------+
| apples   |     1 | 2018-07-01 |
| apples   |     2 | 2018-07-02 |
| apples   |     3 | 2018-07-03 |
| bananas  |     7 | 2018-07-04 |
| bananas  |     8 | 2018-07-05 |
| bananas  |     9 | 2018-07-06 |
+----------+-------+------------+

Desired results are:

+----------+-------+------------+
| Category | Value |    Date    |
+----------+-------+------------+
| apples   |     3 | 2018-07-03 |
| bananas  |     9 | 2018-07-06 |
+----------+-------+------------+

Here is a start table for PBI:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSiwoyEktVtJRMgRiIwNDC10Dc10DQ6VYHSQ5I2Q5I1Q5Y2Q5Y7BcUmIeEIIkzZElTdAkLZAlTdEkLZElzZRiYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Category = _t, Value = _t, Date = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Category", type text}, {"Value", Int64.Type}, {"Date", type date}})
in
    #"Changed Type"

I wonder if there is a way to come to desired results in subsequent steps within only one table, by adding some magic column IsMax:

+----------+-------+------------+-------+
| Category | Value |    Date    | IsMax |
+----------+-------+------------+-------+
| apples   |     1 | 2018-07-01 |     0 |
| apples   |     2 | 2018-07-02 |     0 |
| apples   |     3 | 2018-07-03 |     1 |
| bananas  |     7 | 2018-07-04 |     0 |
| bananas  |     8 | 2018-07-05 |     0 |
| bananas  |     9 | 2018-07-06 |     1 |
+----------+-------+------------+-------+

回答1:

Doing a basic Group By in the Power Query Editor (group by Category and take the max over Value) gets you this table:

+----------+-------+
| Category | Value |
+----------+-------+
| apples   |     3 |
| bananas  |     9 |
+----------+-------+

Add a custom column IsMax that is simply the value 1 to this table and then merge (left outer join) it with your original table matching on both Category and Value. Finally, expand the IsMax column to get your desired table, except with null instead of 0. You can replace the null values if you choose.

Here's the M code for all those steps:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSiwoyEktVtJRMgRiIwNDC10Dc10DQ6VYHSQ5I2Q5I1Q5Y2Q5Y7BcUmIeEIIkzZElTdAkLZAlTdEkLZElzZRiYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Category = _t, Value = _t, Date = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Value", Int64.Type}, {"Date", type date}, {"Category", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Category"}, {{"Value", each List.Max([Value]), Int64.Type}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "IsMax", each 1, Int64.Type),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type",{"Category", "Value"},#"Added Custom",{"Category", "Value"},"Added Custom",JoinKind.LeftOuter),
    #"Expanded Added Custom" = Table.ExpandTableColumn(#"Merged Queries", "Added Custom", {"IsMax"}, {"IsMax"})
in
    #"Expanded Added Custom"


回答2:

Create a calculated column:

IsMax =
VAR Max_Value =
    CALCULATE (
        MAX ( Table[Value] ),
        FILTER ( Table, Table[Category] = EARLIER ( Table[Category] ) )
    )
RETURN
    IF ( Table[Value] = Max_Value, 1, 0 )

How it works: First, FILTER function selects all records in the table that have the same Category as the current row, and finds their max value. The result is saved into a variable. Second, IF compares current row value and saved max value.



回答3:

I ended up getting MAX per category through index. Idea described here: https://stackoverflow.com/a/51498237/1903793

Approach #1 is one-liner snipped in R transformation:

library(dplyr)
output <- dataset %>% group_by(Category) %>% mutate(row_no_by_category = row_number(desc(Date)))

Approach #2, done completely in PBI:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSiwoyEktVtJRMgRiIwNDC10Dc10DQ6VYHSQ5I2Q5I1Q5Y2Q5Y7BcUmIeEIIkzZElTdAkLZAlTdEkLZElzZRiYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Category = _t, Value = _t, Date = _t]),
    #"Grouped rows" = Table.Group(Source, {"Category"}, {{"NiceTable", each Table.AddIndexColumn(Table.Sort(_,{{"Date", Order.Descending}} ), "Index",1,1), type table}} ),
    #"Expanded NiceTable" = Table.ExpandTableColumn(#"Grouped rows", "NiceTable", {"Value", "Date", "Index"}, {"Value", "Date", "Index"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded NiceTable", each ([Index] = 1))
in
    #"Filtered Rows"