is there a way to indicate duplicate rows across multiple columns using an array formula?
Data:
AA1 BB1 CC2 duplicate
AA1 BB2 CC1
AA1 BB1 CC2 duplicate
AA1 BB1 CC1
In the above table, rows 1 and 3 are the ones I need to indicate, by putting "duplicate" in column 4.
I know of the remove duplicates functionality in Excel, but I have to see the duplicate lines before actually deleting them. Also, adding a hidden helper column is not an option because of what happens with the file further down in the process...
If data was just in one column, a countif formula would work. So I was hoping some sort of countif(col1 & col2 & col3, range(A:A & B:B & C;C))
could do the trick...
Thanks!
It;s not necessary here for array formula COUNTIFS will do the job.
=COUNTIFS($A$1:$A$4,A1,$B$1:$B$4,B1,$C$1:$C$4,C1)
To your point where removing the duplicate lines is the objective, not deleting all rows including the first occurrence, and a helper column is not an option, here is how to achieve it.
Using a slightly different formula from Adirmola's answer:
At column D, observe how the addresses are locked... e.g. A$1:A1... for formula at row 1. As you fill down the formula, the left part row number stays the same, but the right part row number increases. Therefore counting the instance of the duplicate occurence.
Then if adding a helper column is not an option, lets bring in the conditional formatting for the purpose of highlighting those 2nd, 3rd, 4th.. occurence, filter by color, and delete them.
Here is how, you will first select the region where the duplicates occur. The active cell (cell in white instead of grayed of the selected region) must be at the first row of the selection.
Add a conditional formatting, using the same formula in column D above for row 1, but this time, lock all the columns, and put a condition
>1
behind.Apply the condition, and you can go ahead and filter by color and delete the duplicates!
Additional info: COUNTIF and COUNTIFS is a very inefficient formula for very large data (about 10,000 rows above depending on how many columns involved). You may feel slow Excel response so it might be a good idea to delete the formula away after removing the duplicate rows. Otherwise, add a double quote to disable the formula so that it can be reused next time.
="COUNTIFS($A$1:$A1,$A1,$B$1:$B1,$B1,$C$1:$C1,$C1) > 1"
Hope this helps
You have to understand what does a duplicate mean. It means if there is occurrence of any more occurrences of the original value. In you example, the first row is NOT a duplicate because it does not have any occurrences before. The next value is a duplicate because it has a second occurrence. I have prepared for you a method to extract out duplicates and mark them as need.
Formula in cell D1:
Formula in cell E1:
Formula in cell F1:
--Edit:
If you want to show all duplicates(including the original value)
Formula in cell D1:
Formula in cell E1:
Formula in cell F1:
Cheers!