I have 20K data sheet that contain multiple duplicates for column 1. I need to keep 1st three duplicates for column 1 for each number and delete rest of it.
I need to keep the yellow highlighted and delete the rest of it.
I have 20K data sheet that contain multiple duplicates for column 1. I need to keep 1st three duplicates for column 1 for each number and delete rest of it.
I need to keep the yellow highlighted and delete the rest of it.
So, going the VBA route, you'll need to loop through each row in your tab and test the value to see if it's a duplicate, if it's a duplicate then you'll increase a counter variable and once that counter variable hits 3 you start deleting rows.
This is a little complicated if you haven't worked with VBA. Please take some time to play with the code and understand it. I've written comments in it to help out.
Sub keepFirstThreeDuplicates()
Dim workingRow As Integer
Dim currentDup As String
Dim dupCounter As Integer
Dim wsheet As Worksheet
'change this to your tab name
Set wsheet = ThisWorkbook.Sheets("Sheet1")
'loop through every row just guessing that your data starts at row 1 (A1) and goes to 50000 (A50000)
For workingRow = 1 To 50000
If workingRow = 1 Then 'we are at the first row, so grab the value and set dupCounter to 1
currentDup = wsheet.Cells(workingRow, 1).Value 'Assuming column 1, so this is Cell A1
dupCounter = 1
ElseIf currentDup = wsheet.Cells(workingRow, 1).Value Then 'we have another duplicate
If dupCounter = 3 Then 'We already have three duplicates, so delete the row, and set the row back one (because we deleted the row)
wsheet.Rows(workingRow).Delete
workingRow = workingRow - 1
Else
dupCounter = dupCounter + 1
End If
Else 'We are at a new value, so grab the value and set dupCounter to 1
currentDup = wsheet.Cells(workingRow, 1).Value
dupCounter = 1
End If
'exit the for loop if we hit a blank
If currentDup = "" Then Exit For
Next workingRow
End Sub
If you are super new to VBA, to use this code:
While in your workbook, hit Ctrl+F11 to get to the Visual Basic Editor (VBE)
Your workbook will be called a "VBAProject" in the VBAProject panel. Right click on it and select Insert>>Module
Double click your new module "Module1" to open it.
Paste in this code.
To run it, click somewhere in the code and hit the play button up top (or F5 on your keyboard). Make sure you edit the code to suit the needs of your workbook like changing the tab name to your tab. Also make sure to back up your workbook before running it as this will delete rows and you won't be able to undo.
Lastly, this is just one way to do it. It's not meant to 100% address your needs since I'm guessing at number of rows, which column your data is in, that the data is already sorted by your duplicate column, and other whatnot. This is just to point you in a solid direction in VBA. It worked on my test workbook for a fake list I created in Column A of a fresh workbook.
An alternate approach without using VBA and does not require your Data to be sorted is to create a third column of data where you simply count the number of times a Sender ID has happened.
Assuming:
Sender is column B
Message is column C
Count will be new column C
Row 1 is a header row
Data starts in row 2
in D2 place the the follow formula:
=COUNTIF($B$2:B2,B2)
Copy that down the entire column.
Select a select within your data or in your header row.
Right click on the header in D1 and select filter and from the fly out select by cells value. (Your data may disappear but its just hiding)
Left click on the funnel icon in the D1 and select number filter. From the fly out select less than or equal to and place the number 3 in the top right box.
Select your original 2 columns of data and copy and paste to a new sheet and you will only have the first three entries in the new list.