Power Query - Add Column CountIF across two tables

2019-09-05 18:29发布

问题:

I have two tables in PowerQuery City and People.

CITY

Id, Name
 1, Brisbane
 2, Sydney
 3, Melbourne

PEOPLE

Id, Name,  CityId
 1, Jay,   1
 2, Sam,   2
 3, Paul,  1
 4, Sarah, 3

I'd like to add a column to City that shows a count of how many people belong to that city. So far I have:

Table.AddColumn(City, "People.Count", each
    Table.RowCount(
        Table.SelectRows(
            People, 
            each [CityId] = [Id]
        )
    )
)

This returns all zeros in the new column. If I replace [Id] with 1 then I get 2.

It appears the [] references the value on the current row but on the same table. Is there a way I can write it as

Table.AddColumn(City, "People.Count", each 
    Table.RowCount(
        Table.SelectRows(
            People, 
            each People[CityId] = City[Id]
        )
    )
)

In Excel this would be as basic as

=COUNTIF(People[CityId],[@Id])

and then pulled down the new column in PEOPLE (as per attached .xlsx)

Seems simple enough but absolutely stuck! Eeek.

回答1:

I would build a Query that starts from PEOPLE, and then Groups By City and calculates a Count.

Then I would Merge the CITY query with that new Query and expand the Count column.

No code or formulas are required - it's actually simpler than Excel formulas.



回答2:

Ah, now I see what you're after. There's a nifty trick to do it with a simple column:

= Table.AddColumn(City, "People.Count", each let Id=[Id] in Table.RowCount(Table.SelectRows(People, each [CityId] = Id)))

You need to define the column name as a variable in order to "leave" the current "context" you're in (People's table).



回答3:

There are other answers about how to write this differently, but the most general solution here is probably to remove the nested each syntax.

It might be helpful to see that your query

Table.AddColumn(
    City,
    "People.Count", each 
      Table.RowCount(Table.SelectRows(
        People, 
        each [CityId] = [Id])))

is syntactic sugar for this rewritten version. In the _[CityId] = _[Id] expression, the _ variable is bound the innermost parameter, and the outermost _ isn't used:

Table.AddColumn(
    City,
    "People.Count", (_) =>
      Table.RowCount(Table.SelectRows(
        People, 
        (_) => _[CityId] = _[Id])))

If you ever have nested each expressions, I'd change the outer each to a function with a real variable name, like this:

Table.AddColumn(
    City,
    "People.Count", (CityRow)  =>
      Table.RowCount(Table.SelectRows(
        People, 
        each [CityId] = CityRow[Id])))