Matching a number in a column with the same number

2019-03-02 13:25发布

问题:

First, I'd like to apologize for this poor question and I hope it doesn't upset anyone here. Since I'm not that good at speaking English to convey my request, so please have a look to the cited links in order to get a clear explanation to this question.

I'm trying to find the solution to this question of mine. I started my attempt by searching for the same number in column A and column B (Debit and Credit). I used the looping-trough-array method to do it instead of employing the Find function like this question since I think it's faster.

Suppose that I have the following set data in Sheet1 and start from row 1 column A:

D e b i t   Cr e d i t
20          13
14          13
13          14
14          17
19          19
11          20
17          14
20          12
19          19
20          15
20          12
13          11
12          19
13          20
19          19
20          11
11          16
10          16
19          19
20          11

Now, I'd like to process the data set above to something like this:

Basically, I need to find the same value of debit and credit in a specific row and match it with debit and credit in another row. Column C (Row) indicates the matched values. For example, the debit value in row 2 match with the credit value in row 15 and vice-versa. And numbers in column D (ID Match) are the label numbers to indicate the order of the matched data that's found first. This is my code in an attempt to implement the task:

Public i As Long, j As Long, k As Long, Last_Row As Long
Public DC, Row_Data, ID_Match
Sub Quick_Match()
T0 = Timer
k = 0
Last_Row = Cells(Rows.Count, "A").End(xlUp).Row

ReDim DC(1 To Last_Row - 1, 1 To 2)
ReDim Row_Data(1 To Last_Row - 1, 1 To 1)
ReDim ID_Match(1 To Last_Row - 1, 1 To 1)
DC = Range("A2:B" & Last_Row)

For i = 1 To Last_Row - 1
    If DC(i, 1) <> "" Then
            k = k + 1
            For j = 1 To Last_Row - 1
                If DC(i, 1) <> DC(i, 2) Then
                    If DC(i, 1) = DC(j, 2) And DC(i, 2) = DC(j, 1) Then
                        Call Row_Label
                        Exit For
                    Else
                        Row_Data(i, 1) = "No Match"
                    End If
                Else
                    If i <> j Then
                        If DC(i, 1) = DC(j, 1) And DC(i, 2) = DC(j, 2) Then
                            Call Row_Label
                            Exit For
                        Else
                            Row_Data(i, 1) = "No Match"
                        End If
                    End If
                End If
            Next j
    End If

    If Row_Data(i, 1) = "No Match" Then
        k = k - 1
    End If

Next i

Range("C2:C" & Last_Row) = Row_Data
Range("D2:D" & Last_Row) = ID_Match
InputBox "The runtime of this program is ", "Runtime", Timer - T0
End Sub

Sub Row_Label()
    Row_Data(i, 1) = j + 1
    ID_Match(i, 1) = k
    Row_Data(j, 1) = i + 1
    ID_Match(j, 1) = k
    DC(i, 1) = ""
    DC(i, 2) = ""
    DC(j, 1) = ""
    DC(j, 2) = ""
End Sub

Though it's a bit slow on its performance, but it works fine. It completes in about 25 seconds on my machine for processing 10,000 rows of data (the data set file can be downloaded on this link for testing the running time of your code and mine). So I'm wondering if there is a more effective way for doing this. Could anyone come up with either a shorter version or a quicker version? Please do share your attempt.

回答1:

I reworked my previous answer introducing a second loop; so that our ID numbers will match.

Sub DebitCreditCrossMatch()

    Dim dictKeys As Object, dictRows As Object
    Dim DebitKey As String, CreditKey As String
    Dim arrDebit, arrCredit, items, keys
    Dim arrMatchRow(), arrMatchID()
    Dim ID As Long, rw As Long, x As Long, lastRow As Long

    lastRow = Cells(Rows.count, "A").End(xlUp).Row

    arrDebit = Range("A1", "A" & lastRow).Value
    arrCredit = Range("B1", "B" & lastRow).Value

    ReDim arrMatchID(lastRow - 2)
    ReDim arrMatchRow(lastRow - 2)

    Set dictKeys = CreateObject("Scripting.Dictionary")

    For x = 2 To lastRow

        DebitKey = arrDebit(x, 1) & ":" & arrCredit(x, 1)

        CreditKey = arrCredit(x, 1) & ":" & arrDebit(x, 1)

        If dictKeys.Exists(CreditKey) Then
            Set dictRows = dictKeys(CreditKey)
            items = dictRows.items
            keys = dictRows.keys
            rw = CLng(items(0))
            arrMatchRow(x - 2) = rw
            arrMatchRow(rw - 2) = x
            dictRows.Remove keys(0)

            If dictRows.count = 0 Then dictKeys.Remove CreditKey

        ElseIf dictKeys.Exists(DebitKey) Then
            Set dictRows = dictKeys(DebitKey)
            dictRows.Add x, x
        Else
            Set dictRows = CreateObject("Scripting.Dictionary")
            dictRows.Add x, x
            dictKeys.Add DebitKey, dictRows
        End If
    Next

    For x = 0 To lastRow - 2

        If Not IsEmpty(arrMatchRow(x)) And IsEmpty(arrMatchID(x)) Then
            rw = arrMatchRow(x) - 2
            arrMatchRow(rw) = x + 2
            ID = ID + 1
            arrMatchID(x) = ID
            arrMatchID(rw) = ID
        Else
            If IsEmpty(arrMatchRow(x)) Then
                arrMatchRow(x) = "No Match"
            End If
        End If

    Next

    Range("C2", "C" & lastRow).Value = WorksheetFunction.Transpose(arrMatchRow)
    Range("D2", "D" & lastRow).Value = WorksheetFunction.Transpose(arrMatchID)

    Set dictKeys = Nothing
    Set dictRows = Nothing

End Sub


回答2:

Our ID's don't are different because I don't search ahead in the list for a match. I iterate over the list one time adding keys to a dictionary. If a find a key already exists that matches your criteria assign the new ID and row numbers.

Let me know if this meets your criteria.

Sub DebitCreditCrossMatch()

    Dim dictKeys As Object, dictRows As Object
    Dim DebitKey As String, CreditKey As String
    Dim arrDebit, arrCredit, arrMatchRow, arrMatchID, items, keys
    Dim ID As Long, rw As Long, x As Long, lastRow As Long

    lastRow = Cells(Rows.count, "A").End(xlUp).Row

    arrDebit = Range("A1", "A" & lastRow).Value
    arrCredit = Range("B1", "B" & lastRow).Value
    arrMatchRow = Range("C1", "C" & lastRow).Value
    arrMatchID = Range("D1", "D" & lastRow).Value

    Set dictKeys = CreateObject("Scripting.Dictionary")

    For x = 2 To lastRow
        arrMatchRow(x, 1) = "No Match"
        arrMatchID(x, 1) = "No Match"

        DebitKey = arrDebit(x, 1) & ":" & arrCredit(x, 1)

        CreditKey = arrCredit(x, 1) & ":" & arrDebit(x, 1)

        If dictKeys.Exists(CreditKey) Then
            Set dictRows = dictKeys(CreditKey)
            items = dictRows.items
            keys = dictRows.keys
            rw = CLng(items(0))
            ID = ID + 1
            arrMatchRow(x, 1) = rw
            arrMatchRow(rw, 1) = x
            arrMatchID(x, 1) = ID
            arrMatchID(rw, 1) = ID
            dictRows.Remove keys(0)

            If dictRows.count = 0 Then dictKeys.Remove CreditKey

        ElseIf dictKeys.Exists(DebitKey) Then
            Set dictRows = dictKeys(DebitKey)
            dictRows.Add x, x
        Else
            Set dictRows = CreateObject("Scripting.Dictionary")
            dictRows.Add x, x
            dictKeys.Add DebitKey, dictRows
        End If
    Next

    Range("C1", "C" & lastRow).Value = arrMatchRow
    Range("D1", "D" & lastRow).Value = arrMatchID

    Set dictKeys = Nothing
    Set dictRows = Nothing

End Sub


回答3:

Assuming there are no duplicated in the Credit-Debit pairs you could use the following methods in a separate module calling matchCreditDebit() and adjusting the ranges in the initialization phase as needed:

            Option Explicit

    Public Sub matchCreditDebit()
    Dim ws As Worksheet
    Set ws = ThisWorkbook.Worksheets("Sheet1")

    Dim debit As Range, credit As Range, match As Range, rows As Long

    rows = ws.UsedRange.rows.Count
    Set credit = ws.Range("A1:A" & rows)
    Set debit = ws.Range("B1:B" & rows)
    Set match = ws.Range("C1:C" & rows)

    match.Offset(1).Clear 'delete previous matched, start with clean slate, offset used to preserve header

    Dim foundRanges As Collection, i As Long, r As Range

    For i = 2 To rows
        If Len(ws.Cells(i, match.Column).Value) = 0 _
        And Len(ws.Cells(i, credit.Column).Value) > 0 Then  'check if match is already found and credit has value

            Set foundRanges = FindAllInRange(debit, credit.Cells(i, 1).Value) 'first sift, find matching debit with a credit value

            If Not foundRanges Is Nothing Then
                For Each r In foundRanges
                    Debug.Print r.Address, ws.Cells(r.Row, credit.Column).Value
                    If ws.Cells(r.Row, credit.Column).Value = ws.Cells(i, debit.Column) Then 'second sift, match for found credit in debit
                        ws.Cells(r.Row, match.Column).Value = i
                    End If
                Next r
            End If
        End If
    Next i

    End Sub

    Public Function FindAllInRange( _
        ByRef searchRange As Range, _
        ByVal FindWhat As Variant _
    ) As Collection
    Dim result As Collection

        Set result = New Collection

        Dim nextFound As Range
        Set nextFound = searchRange.Cells(searchRange.rows.Count, 1)

        Do
            Set nextFound = searchRange.Find( _
                    What:=FindWhat, _
                    After:=nextFound, _
                    LookIn:=xlValues, _
                    LookAt:=xlWhole, _
                    SearchOrder:=xlByRows)

            If nextFound Is Nothing Then Exit Do
            If collectionContainsRange(result, nextFound) Then Exit Do

            result.Add nextFound, nextFound.Address
        Loop While True
        Set FindAllInRange = result
    End Function

    Private Function collectionContainsRange(ByRef result As Collection, ByRef rng As Range) As Boolean
    collectionContainsRange = False

        Dim r As Range
        For Each r In result
            If StrComp(r.Address, rng.Address, vbTextCompare) = 0 Then
                collectionContainsRange = True
                Exit Function
            End If
        Next r
    End Function

Generally what this does is find the ranges that match the first criteriaon and put them into a collection and from those sift out the ones that match the second criteria. However if there are duplicate pairs the last found reverse pair will be entered as ref.



回答4:

Edit comments: Starting inner loop without lctrRow will not do back-checking. Reinstated original code.

Sub test()

    '/ Assuming that on Sheet1 starting at A1, four headers are : Debit   Credit  Row   ID Match


    Dim lCtrRow         As Long
    Dim lCtrRow2        As Long
    Dim lmatchCount     As Long

    Dim arrResult

    arrResult = Sheet1.UsedRange

    '/ Loop through first column Rows
    For lCtrRow = LBound(arrResult) To UBound(arrResult)
        lmatchCount = 0
        arrResult(lCtrRow, 3) = "No Match"

        '/ Re-Loop but this time match if A&B = B&A
        For lCtrRow2 = LBound(arrResult) + 1 To UBound(arrResult)
            If arrResult(lCtrRow, 1) & arrResult(lCtrRow, 2) = arrResult(lCtrRow2, 2) & arrResult(lCtrRow2, 1) Then
                '/ If no match then only put down the row number. Avoids overwriting.
                If arrResult(lCtrRow, 3) = "No Match" Then
                    arrResult(lCtrRow, 3) = lCtrRow2
                End If

                '/ Keep track of no. matches found.
                lmatchCount = lmatchCount + 1
                arrResult(lCtrRow, 4) = lmatchCount
            End If
        Next
    Next


      '/ Dump the processed result back on  another sheet
      Sheet2.Range("a1").Resize(UBound(arrResult), UBound(arrResult, 2)) = arrResult

 End Sub


回答5:

This worked for me:

Sub Matching()

    Dim rng, arr, r1 As Long, r2 As Long, nR As Long
    Dim sortId As Long, rwTrack(), s1, s2

    'get the input range
    With Range("a1").CurrentRegion 'assumes no blank columns/rows
        Set rng = .Offset(1, 0).Resize(.Rows.Count - 1)
    End With

    arr = rng.Value
    nR = UBound(arr, 1)
    ReDim rwTrack(1 To nR) 'for matching row numbers to sortId
                           '  (should be more like nR/2 but lazy...)
    sortId = 1

    For r1 = 1 To nR
        For r2 = r1 + 1 To nR
            If arr(r1, 1) = arr(r2, 2) And arr(r1, 2) = arr(r2, 1) Then
                s1 = arr(r1, 4)
                s2 = arr(r2, 4)
                If Len(s1) = 0 And Len(s2) = 0 Then
                    'new match - assign new Id
                    arr(r1, 4) = sortId
                    arr(r2, 4) = sortId
                    rwTrack(sortId) = r1 & "," & r2 'capture the rows
                    sortId = sortId + 1
                Else
                    'already matched: copy the existing Id and track rows
                    If Len(s1) > 0 And Len(s2) = 0 Then
                        arr(r2, 4) = s1
                        rwTrack(s1) = rwTrack(s1) & "," & r2
                    End If
                    If Len(s2) > 0 And Len(s1) = 0 Then
                        arr(r1, 4) = s2
                        rwTrack(s2) = rwTrack(s2) & "," & r1
                    End If
                End If
            End If
        Next r2
    Next r1
    'populate all of the matched row numbers
    For r1 = 1 To nR
        If arr(r1, 4) <> "" Then arr(r1, 3) = rwTrack(arr(r1, 4))
    Next r1

    'dump the data back
    Range("a1").Offset(1, 0).Resize(nR, UBound(arr, 2)).Value = arr

End Sub

Before and after:



回答6:

To speed up the matching, can improve thru Algorithm. Assuming your code are working fine.

1) We can sort Column A then Column B, therefore, your data will be like this

Row   A    B
2     20   13
3     20   12
4     20   11
.
.
. 
998   13   20
999   12   20
1000  11   20
.
.
.

2) While looping Debit column to find the first value 20 within Credit column might have a very huge gap. We can then add in application.Match(20,Range("B:B"),0) to find out the row to start the loop.

Base on above assumption, we can reduce about 1000 times of loop. (in real case, it could be more/less). Application.Match() is lot more faster than looping one by one.

3) Exit the loop, when Credit value is less than Debit value, because we have sort the data in sequence, we can assume there have no possible match, when Credit < Debit.

4) While the use Application.ScreenUpdating = False, can increase the processing speed.

5) Without touch the original data, also can use Application.Match to reduce row by row loop. Assume, you have 10K records,

first set searchRng as C1:C10000, then match to find the row of the first Debit value (20, base on yr photo),

then we found matched record on Row 7, check if the record match both Debit & Credit, if not reduce the resize of searchRng to C8:C10000 then keep repeating the logic

Sub Match ()
   For nRow = 2 to lastRow 'Loop for each row in Column A
       set searchRng = Range("C1:C10000")
       debitVal = Cells(nRow, "B")

       Do until searchRng is Nothing
          If IsError(Application.Match(debitVal, searchRng, 0)) then
            'No Match
            Exit Do
          Else
             N = Application.Match(debitVal, searchRng, 0)
             'Do something to check if Record match, and assign ID Match
             If IsRecordMatched Then
                'Assign ID
                'Matching Range - Cells(nRow,"B")
                'Matched Range - Cells(searchRng.Cells(1).Offset(N,0).Rows, "B")
             Else
                'Resize the searchRng
                nSize = searchRng.Cells.Count - (N + 1)
                if nSize < 1 then Exit Do
                set searchRng = searchRng.Resize(nSize,1)
                set searchRng = searchRng.Offset(N + 1,0)
             End If
          End If
       Loop
   Next nRow
End Sub

Above code not tested. Please take it as reference.



回答7:

Improvement

The following code completes less than 2.4 seconds on average. It's twice faster than the previous one and also shorter.

Sub Quick_Match()
Dim i As Long, j As Long, k As Long, Last_Row As Long
Dim DC, Row_Data, ID_Match
Last_Row = Cells(Rows.Count, "A").End(xlUp).Row
ReDim DC(1 To Last_Row - 1, 1 To 2)
ReDim Row_Data(1 To Last_Row - 1, 1 To 1)
ReDim ID_Match(1 To Last_Row - 1, 1 To 1)
DC = Range("A2:B" & Last_Row).Value2

For i = 1 To Last_Row - 2
    If DC(i, 1) <> vbNullString Then
            k = k + 1
            For j = i + 1 To Last_Row - 1
            If DC(j, 2) <> vbNullString Then
                If DC(i, 1) = DC(j, 2) And DC(i, 2) = DC(j, 1) Then
                    Row_Data(i, 1) = j + 1: ID_Match(i, 1) = k
                    Row_Data(j, 1) = i + 1: ID_Match(j, 1) = k
                    DC(i, 1) = vbNullString: DC(i, 2) = vbNullString
                    DC(j, 1) = vbNullString: DC(j, 2) = vbNullString
                    Exit For
                End If
            End If
            Next j
    End If

    If Row_Data(i, 1) = vbNullString Then
        Row_Data(i, 1) = "No Match": k = k - 1
    End If
Next i

Range("C2:C" & Last_Row) = Row_Data
Range("D2:D" & Last_Row) = ID_Match
End Sub

[Old answer] Some progress. The following code completes less than 5.2 seconds on average for processing 10,000 rows of data on my machine. Not only is faster, but it's also a bit shorter than the previous one. I changed the looping algorithm to improve its performance. I also use some speedup tricks like using .Value2rather than the default property (.Value) makes Excel do less processing and assigning vbNullString instead of Zero Length String ("") to the elements of the array that has found its match or labeled "No Match" so that the loop procedure won't process it again.

Public i As Long, j As Long, k As Long, Last_Row As Long
Public DC, Row_Data, ID_Match
Sub Quick_Match()
T0 = Timer
k = 0
Last_Row = Cells(Rows.Count, "A").End(xlUp).Row

ReDim DC(1 To Last_Row - 1, 1 To 2)
ReDim Row_Data(1 To Last_Row - 1, 1 To 1)
ReDim ID_Match(1 To Last_Row - 1, 1 To 1)
DC = Range("A2:B" & Last_Row).Value2

For i = 1 To Last_Row - 1
    If DC(i, 1) <> vbNullString Then
            k = k + 1
            For j = 1 To Last_Row - 1
            If DC(j, 2) <> vbNullString Then
                If DC(i, 1) <> DC(i, 2) Then
                    If DC(i, 1) = DC(j, 2) And DC(i, 2) = DC(j, 1) Then
                        Call Row_Label
                        Exit For
                    End If
                Else
                    If i <> j Then
                        If DC(i, 1) = DC(j, 1) And DC(i, 2) = DC(j, 2) Then
                            Call Row_Label
                            Exit For
                        End If
                    End If
                End If
            End If
            Next j
    End If

    If Row_Data(i, 1) = vbNullString Then
        Row_Data(i, 1) = "No Match"
        DC(i, 2) = vbNullString
        k = k - 1
    End If

Next i

Range("C2:C" & Last_Row) = Row_Data
Range("D2:D" & Last_Row) = ID_Match
InputBox "The runtime of this program is ", "Runtime", Timer - T0
End Sub

Sub Row_Label()
    Row_Data(i, 1) = j + 1
    ID_Match(i, 1) = k
    Row_Data(j, 1) = i + 1
    ID_Match(j, 1) = k
    DC(i, 2) = vbNullString
    DC(j, 1) = vbNullString
    DC(j, 2) = vbNullString
End Sub