I'm an Excel VBA newbie and i'm trying to get the duplicates rows to appends to the first occurence of that row.
Per exemple we have the table here
I would like to format data as here
The logic goes like this. Whenever we detect that the last name and the birth date are the same for the current and following line that mean we have a dependant and we need to append the dependant's data to the "Main"
I have started writing code but i'm not able to detect the dependants properly. Below is what i have. please consider that i'm a real noob and i'm trying hard.
Sub formatData()
Dim sh As Worksheet
Dim rw As Range
Dim RowCount As Integer
'This variable is checked to see if we have a first occurence of a line
Dim firstOccurence
'Initialise the variables for that will be used to match the data
Dim LocationName
Dim PlanCode
Dim LastName
Dim FirstName
Dim dependantFirstName
Dim dependantLastName
Dim dependantBirthdate
RowCount = 0
firstOccurence = True
'Check if the spreadsheet already exist if not create it.
For i = 1 To Worksheets.Count
If Worksheets(i).Name = "Benefits Census Formatted" Then
exists = True
End If
Next i
If Not exists Then
'Create a new spreadsheet to add the data to
Set ws = Sheets.Add
Sheets.Add.Name = "Benefits Census Formatted"
End If
'Set the ActiveSheet to the one containing the original data
Set sh = Sheets("BENEFIT Census")
With ActiveSheet
LastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
For Each rw In sh.Rows
'If the data of one cell is empty EXIT THE LOOP
If sh.Cells(rw.Row, 1).Value = "" Then
Exit For
End If
If rw.Row > 1 Then
'Afffecting the variables to the next loop so we can compare the values
nextLocationName = sh.Cells(rw.Row + 1, 1).Value
nextPlanCode = sh.Cells(rw.Row + 1, 2).Value
nextLastName = sh.Cells(rw.Row + 1, 3).Value
nextFirstName = sh.Cells(rw.Row + 1, 4).Value
nextEmploymentDate = sh.Cells(rw.Row + 1, 5).Value
nextBirthDate = sh.Cells(rw.Row + 1, 6).Value
nextDependantFirstName = sh.Cells(rw.Row + 1, 25).Value
nextDependantLastName = sh.Cells(rw.Row + 1, 26).Value
nextDependantBirthdate = sh.Cells(rw.Row + 1, 27).Value
Debug.Print LastName & " - " & FirstName & " ::: " & nextLastName & " - " & nextFirstName & " : " & rw.Row & " : " & firstOccurence
'First time you pass through the loop write the whole lane
If firstOccurence = True Then
'Affecting the variables to the current loops values
LocationName = sh.Cells(rw.Row, 1).Value
PlanCode = sh.Cells(rw.Row, 2).Value
LastName = sh.Cells(rw.Row, 3).Value
FirstName = sh.Cells(rw.Row, 4).Value
dependantFirstName = sh.Cells(rw.Row, 25).Value
dependantLastName = sh.Cells(rw.Row, 26).Value
dependantBirthdate = sh.Cells(rw.Row, 27).Value
'Write the current line
sh.Rows(rw.Row).Copy
'We copy the value into another sheet
Set ns = Sheets("Benefits Census Formatted")
LastRow = ns.Cells(ns.Rows.Count, "A").End(xlUp).Row + 1
ns.Rows(LastRow).PasteSpecial xlPasteValues
firstOccurence = False
Else
'We match the location with the plan code and the last name and first name of the user to find duplicates
If dependantFirstName <> nextDependantFirstName And PlanCode <> nextPlanCode And LastName <> nextLastName And FirstName <> nextFirstName Then
'We find a different dependant if the first name or the last name or the birthdate differs
'If Not (dependantFirstName <> nextDependantFirstName) Or Not (dependantLastName <> nextDependantLastName) Or Not (dependantBirthdate <> nextDependantBirthdate) Then
'We have a dependant Append it to the line
'append the user to the currentLine
'End If
Else
'If the dependantFirstName and the nextDependant First name doesn't match then on the next loop we print the full line
firstOccurence = True
End If
End If
RowCount = RowCount + 1
'End of if row > 2
End If
Next rw
End With
End Sub
I would use an approach using Dictionaries to collect and organize the data, and then output it. Judging both by your comments, and the code, there is a lot of stuff you haven't included. But the following code will take your original data, and output a table close to what you show -- some of the results ordering is different, but it is standardized (i.e. there is a relation listed with every dependent name.
In the dictionary, we use Last Name and Birthdate as the "key" so as to combine what you stated were the duplicates.
We define two Class objects
Once we have it organized, it is relatively simple to output it as we want.
For a discussion of Classes, you can do an Internet search. I would recommend Chip Pearson's Introduction to Classes
Be sure to read the notes in the code about renaming the class modules, and also setting a reference to Microsoft Scripting Runtime
Class1
Class2
Regular Module
Source Data
Results
This is the code I wrote for you. (Glad to see that so many others did, too. So you got a choice :-))
The above code calls one Sub routine which you must add in the same code module which, by the way, should be a normal code module (by default "Module1" but you can rename it to whatever).
Observe that I inserted the word "Main" as hard text. You could also copy the content of the appropriate call in the Source sheet. This procedure only writes the first entry. Dependents are written by another code.
The entire code is controlled by two "enums", enumerations, one for each of the worksheets. Enums are the quickest way to assign names to numbers. Please paste these two enums at the top of your code sheet, before either of the procedures.
Note that the rule of enums is that you can assign any integer to them. If you don't assign any number the value will be one higher than the previous. So, NfdMain = 4, followed by NfdRelate which has an assigned value of 0, followed by NfdDepName which has a value of 0 + 1 = 1.
The numbers in these enumerations are columns (and rows). You can control the entire output by adjusting these numbers. For example, "Main" is written into column NfdMain (=4 =D). Change the value to 5 and "Main" will appear in column 5 = E. No need to go rummaging in the code. Consider this a control panel.
In the formatted output I introduced a logic which is slightly different from yours. If you don't like it you can change it easily by modifying the enums. My logic has the family name as the main criterion in the first column (switched from the raw data). In column D I write "Main". But when there is a dependent I write the relationship in column D. Therefore only entries without any dependents will have "Main" in that column. For your first example, the formatted row will show Rasmond / Shawn / 01-01-1990 / Spouse / Jessica, Child 1 / Vanessa.
If you wish to keep the "Main and place "Spouse" in the next column, just set the enum NfdRelate = 1. With the "control panel" it's that simple.