Duplicate Removal using an array in vba [duplicate

2019-08-30 22:29发布

This question already has an answer here:

Hi have used some code from an answer to a question 'How do I delete duplicates between two excel sheets quickly vba' and tried to alter this code to suite my own VBA script. the code does delete rows the same amount as to what is in the array but it is just deleting the first 11 rows. I am fairly new to VBA and not completely understanding why it is doing this. Below is a copy of the script I am using.

    Dim overLayWB As Workbook       'Overlay_workbook
    Dim formattedWB As Workbook     'Formatted_workbook
    Dim formattedWS As Worksheet    'Current active worksheet (Formatted)
    Dim overLayWS As Worksheet      'Worksheet in OverLay
    Dim lastRowFormatted As Long
    Dim lastRowOverLay As Long

    Dim targetArray, searchArray
    Dim targetRange As Range
    Dim x As Long

    'Update these 4 lines if your target and search ranges change
    Dim TargetSheetName As String: TargetSheetName = "Formatted"
    Dim TargetSheetColumn As String: TargetSheetColumn = "G22"
    Dim SearchSheetName As String: SearchSheetName = "Overlay"
    Dim SearchSheetColumn As String: SearchSheetColumn = "G22"



    'open Overlay workbook
    Set overLayWB = Workbooks.Open("C:\Documents\Templates\Overlaye.xls") 'Path for workbook Overlay to copy from
    Set formattedWS = Workbooks("Formatted").Sheets("DLT Formatted")
    Set overLayWS = Workbooks("Overlay").Sheets("Overlay")
    Set formattedWB = ThisWorkbook

 'Load target array
    With formattedWS
        Set targetRange = .Range(.Range(TargetSheetColumn & "7"), _
                .Range(TargetSheetColumn & Rows.Count).End(xlUp))
        targetArray = targetRange
    End With

 'Load Search Array
    With overLayWS
        searchArray = .Range(.Range(SearchSheetColumn & "7"), _
                .Range(SearchSheetColumn & Rows.Count).End(xlUp))
    End With


    Dim dict As Object
    Set dict = CreateObject("Scripting.Dictionary")
    'Populate dictionary from search array
    If IsArray(searchArray) Then
        For x = 1 To UBound(searchArray)
            If Not dict.exists(searchArray(x, 1)) Then
                dict.add searchArray(x, 1), 1
            End If
        Next
    Else
        If Not dict.exists(searchArray) Then
            dict.add searchArray, 1
        End If
    End If

    'Delete rows with values found in dictionary
    If IsArray(targetArray) Then
        'Step backwards to avoid deleting the wrong rows.
        For x = UBound(targetArray) To 1 Step -1
            If dict.exists(targetArray(x, 1)) Then
                targetRange.Cells(x).EntireRow.Delete
            End If
        Next
    Else
        If dict.exists(targetArray) Then
            targetRange.EntireRow.Delete
        End If
    End If

Can anyone help me with this it would be much appreicated, I have not altered the scripting correctly, or is it missing something?

2条回答
贪生不怕死
2楼-- · 2019-08-30 22:48

This looks off:

With formattedWS
    Set targetRange = .Range(.Range(TargetSheetColumn & "7"), _
            .Range(TargetSheetColumn & Rows.Count).End(xlUp))
    targetArray = targetRange
End With

with your supplied values it translates to:

With formattedWS
    Set targetRange = .Range(.Range("G227"), _
            .Range("G221048576").End(xlUp))
    targetArray = targetRange
End With 

I don't think that's what you intended and should throw an error.

查看更多
狗以群分
3楼-- · 2019-08-30 22:51

It seems to have become almost received wisdom on this site that the task of deleting rows is best achieved by looping through a Range from bottom to top and deleting each individual row whenever criteria are met. Yet this is really quite an inefficient method. Compare these two snippets, for example:

Dim r As Long
Dim clock As cTimer

Set clock = New cTimer

clock.StartCounter
Application.ScreenUpdating = False
For r = 1 To 10000
    Sheet1.Cells(1, 1).EntireRow.Delete
Next
Application.ScreenUpdating = True
Debug.Print "Row by row:"; clock.TimeElapsed; "ms"

clock.StartCounter
Application.ScreenUpdating = False
Sheet1.Range("A1:A10000").EntireRow.Delete
Application.ScreenUpdating = True
Debug.Print "Range:"; clock.TimeElapsed; "ms"

Output is as follows:

Row by row: 2876.18174935641 ms

Range: 15.2153416146466 ms

These results aren't surprising as it's probably fair to generalise that the greater the number of individual interactions with a Worksheet, the slower the programme will be.

What's a shame is that some of the posts to do with removing duplicates go to great lengths to read Worksheet values and reference items into arrays in order to avoid excessive sheet interactions. And yet all of those efficiency gains are lost to inefficient row deletion. What's misleading is that these posts sometimes purport to be "quick".

Some might argue that they want to carry out tasks on the Worksheet in between row deletions. However, the VBA ranges update their addresses in the same way that an Excel formula range does. Have a look at the code below for an example of this:

Dim cell As Range

Set cell = Sheet1.Range("A3")
Debug.Print "Address before deletion:"; cell.Address
Sheet1.Range("A1").EntireRow.Delete
Debug.Print "Address after deletion:"; cell.Address

Output is:

Address before deletion:$A$3

Address after deletion:$A$2

So the following code would still delete cells "A4" and "A6" and the original cells "A8" and "A10", for example:

Dim rng1 As Range
Dim rng2 As Range

Set rng1 = Sheet1.Range("A4, A6")
Set rng2 = Sheet1.Range("A8, A10")
rng1.EntireRow.Delete
Sheet1.Range("A5").Insert xlDown
rng2.EntireRow.Delete

For a practical application, the OP could genuinely answer the question of 'How do I delete duplicates between two excel sheets quickly vba'? with the following code:

Private Sub RemoveMatchingRowsAsBatch(refRange As Range, targetRange As Range)
    Dim refValues As Variant
    Dim refItems As Collection
    Dim refIndex As Long
    Dim refKey As String
    Dim targetValues As Variant
    Dim targetIndex As Long
    Dim targetKey As String
    Dim test As Variant
    Dim delRows As Range
    Dim added As Boolean

    'Read datasets into arrays
    refValues = refRange.Value2
    targetValues = targetRange.Value2

    'Loop through target values and check if items match
    Set refItems = New Collection
    For targetIndex = 1 To UBound(targetValues, 1)
        If Not IsEmpty(targetValues(targetIndex, 1)) Then
            targetKey = CStr(targetValues(targetIndex, 1))
            test = Empty: On Error Resume Next
            test = refItems(targetKey): On Error GoTo 0

            'Check if existing ref item list has a match
            If Not IsEmpty(test) Then
                targetRange.Cells(targetIndex, 1).EntireRow.Delete
                If delRows Is Nothing Then
                    Set delRows = targetRange.Cells(targetIndex, 1)
                Else
                    Set delRows = Union(delRows, targetRange.Cells(targetIndex, 1))
                End If
            Else
                'There is no match so continue reading the reference list.
                Do While refIndex < UBound(refValues, 1)
                    refIndex = refIndex + 1
                    If Not IsEmpty(refValues(refIndex, 1)) Then
                        'Test that the new reference item isn't itself a duplicate.
                        refKey = CStr(refValues(refIndex, 1))
                        On Error Resume Next
                        refItems.Add refKey, refKey
                        added = Err.Number = 0
                        On Error GoTo 0
                        'It isn't a duplicate so check for a match.
                        If added Then
                            If refKey = targetKey Then
                                If delRows Is Nothing Then
                                    Set delRows = targetRange.Cells(targetIndex, 1)
                                Else
                                    Set delRows = Union(delRows, targetRange.Cells(targetIndex, 1))
                                End If
                                Exit Do
                            End If
                        End If
                    End If
                Loop


            End If
        End If
    Next

    'Now delete all rows in one 'batch'.
    If Not delRows Is Nothing Then
        delRows.EntireRow.Delete
    End If

End Sub

In actual fact, there are also some misunderstandings about the role and function of variables in the OP's code, and other respondents have already pointed those out. However, in the interest of completeness, a correct reading routine for his/her two Worksheets might be something like the below:

Public Sub ReadSheets()
    Dim refFilePath As String
    Dim refBookName As String
    Dim refBook As Workbook
    Dim refSheet As Worksheet
    Dim refSheetName As String
    Dim refCol As String
    Dim refRow As Long
    Dim refRange As Range
    Dim refValues As Variant
    Dim targetBook As Workbook
    Dim targetSheet As Worksheet
    Dim targetSheetName As String
    Dim targetCol As String
    Dim targetRow As Long
    Dim targetRange As Range
    Dim targetValues As Variant

    'Define your sheet variables.
    refFilePath = "Z:\ambie\VBA"
    refBookName = "reference.xlsx"
    refSheetName = "data"
    refCol = "A"
    refRow = "2"
    targetSheetName = "uniques"
    targetCol = "B"
    targetRow = "3"

    'Define the Excel the sheet objects.
    On Error Resume Next
    Set refBook = Workbooks(refBookName)
    On Error GoTo 0
    If refBook Is Nothing Then
        Set refBook = Workbooks.Open(refFilePath & "\" & refBookName)
    End If
    Set refSheet = refBook.Worksheets(refSheetName)
    Set targetBook = ThisWorkbook
    Set targetSheet = targetBook.Worksheets(targetSheetName)

    'Read both datasets.
    With refSheet
        Set refRange = .Range(.Cells(refRow, refCol), _
                              .Cells(.Rows.Count, refCol).End(xlUp))
    End With

    With targetSheet
        Set targetRange = .Range(.Cells(targetRow, targetCol), _
                                 .Cells(.Rows.Count, targetCol).End(xlUp))
    End With

    'Call the removal routine here
    RemoveMatchingRowsAsBatch refRange, targetRange
End Sub
查看更多
登录 后发表回答