Compare each column's contents with all other

2019-08-16 20:28发布

问题:

Given this table:

I'd like to derive this table:

...sort of like a mileage chart in a map book.

I'm trying to create a cross-table comparison of the words in each of the columns, against all of the other columns' words, to show how many matches there are between them.

For instance, comparing Column 1 against Column2 might yield 4 matches. The yellow, bold outlined cells are the matches.

And here's how I count them:

I'm thinking there might be an 'easy' way to accomplish this using Power Query. Is there?

(Oh...and by the way...the solution I'm looking for should not expect a static number of input columns: i.e., it should accommodate for more columns or less columns to be used in the input comparison set.)

Thanks.

回答1:

No, there is no easy way, but it can be done. However, I get different results. My interpretation of your logic is: for each column combination, the number of occurrences of each common word in 1 column must be multiplied with the number of occurrences in the other column. These are my results:

And this is my query code:

let
    Source = Table1,
    ColumnNames = Table.ColumnNames(Source),
    Tabled = Table.FromColumns({ColumnNames}, type table[Columns = text]),
    AddedColumns2 = Table.AddColumn(Tabled, "Columns2", each ColumnNames, type {text}),
    ExpandedColumns2 = Table.ExpandListColumn(AddedColumns2, "Columns2"),
    CommonWords = 
        Table.AddColumn(ExpandedColumns2, 
                        "DistinctIntersect", 
                        each if [Columns] = [Columns2]
                           then {} 
                           else List.Distinct(List.Intersect({Table.Column(Source,[Columns]),
                                                              Table.Column(Source,[Columns2])}))),
    AddedCount = 
        Table.AddColumn(CommonWords,
                        "Count", 
                        (This) => List.Sum({0}&List.Transform(This[DistinctIntersect],
                                                   each List.Count(List.PositionOf(Table.Column(Source,This[Columns]),_,2)) *
                                                        List.Count(List.PositionOf(Table.Column(Source,This[Columns2]),_,2)))),
                       Int64.Type),
    RemovedColumns = Table.RemoveColumns(AddedCount,{"DistinctIntersect"}),
    PivotedColumn = Table.Pivot(RemovedColumns, List.Distinct(RemovedColumns[Columns2]), "Columns2", "Count")
in
    PivotedColumn


标签: powerquery m