Counting unique data in a column and then matching

2019-08-02 03:10发布

问题:

This is an embarrassingly simple question, but I was not able to find a way to figure it out for hours nor also to find an idea from online search.

Suppose I have the following data (the actual data can be thousands or millions) in Excel sheet (Table 1):

Name    No. ID
A       A1
B       B4
B       B5
C       C0
D       -
A       A1
A       A2
E       -
F       -
C       C0
B       B5
B       B5
B       B5
B       B6
A       A1
A       A1
A       A2
B       B3
B       B3
B       B3

The character (-) in column No. ID above can be also a number 0 or a blank cell.

I want to format the above data as follow (Table 2)

Name    Number of ID
A       2
B       4
C       1
D       0
E       0
F       0

It means A has 2 IDs (A1 and A2), B has 4 IDs (B1, B2, B3, and B4), C has 1 ID (C0), and D, E, and F have no ID.

The best I can get by using Pivot Table is like this

How does one perform a task like Table 2 in MS Excel? If possible a VBA script of it.

回答1:

If anyone interested in a VBA solution for the above problem:

Sub GetUniquesCount()
    Dim dict1 As Object, dict2 As Object
    Dim c1 As Variant, c2 As Variant
    Dim i As Long, lastRow As Long, count As Long

    Set dict1 = CreateObject("Scripting.Dictionary")
    Set dict2 = CreateObject("Scripting.Dictionary")

    lastRow = Cells(Rows.count, "A").End(xlUp).Row
    c1 = Range("A2:A" & lastRow)
    For i = 1 To UBound(c1, 1)
        dict1(c1(i, 1)) = 1
    Next i
    Range("D2").Resize(dict1.count) = Application.Transpose(dict1.keys)

    lastRow = Cells(Rows.count, "B").End(xlUp).Row
    c2 = Range("B2:B" & lastRow)
    For i = 1 To UBound(c2, 1)
        dict2(c2(i, 1)) = 1
    Next i

    For Each cel In Range("D2:D" & Cells(Rows.count, "D").End(xlUp).Row)
        count = 0
        For Each k2 In dict2.keys
            If k2 Like cel.Value & "*" Then
                count = count + 1
            End If
        Next k2
        cel.Offset(0, 1).Value = count
    Next cel
End Sub



回答2:

Copy column A to another unused column and perform a Data ► Remove Duplicates command on it then sort it. Few formulas will work efficiently with thousands of rows of data and none will work at all with millions.

Do the same for column B right beside it. Do not extend the selection whjen asked by Remove Duplicates.

Next, use the following formula in the column next to the unduplicated set of column A data you just created:

=COUNTIF(F:F, E2&"*")

Fill down to the extent of the unduplicated data. I've assumed you used column E to create the first unduplicated data set and column F for the second.