Is there a faster way to do this?
Set data = ws.UsedRange
Set unique = CreateObject("Scripting.Dictionary")
On Error Resume Next
For x = 1 To data.Rows.Count
unique.Add data(x, some_column_number).Value, 1
Next x
On Error GoTo 0
At this point unique.keys
gets what I need, but the loop itself seems to be very slow for files that have tens of thousands of records (whereas this wouldn't be a problem at all in a language like Python or C++ especially).
Try this
it should be quite fast and without the drawback NeepNeepNeep told about
Loading the values in an array would be much faster:
You should also consider early binding for the Scripting.Dictionary:
Note that using a dictionary is way faster than Range.AdvancedFilter on large data sets.
As a bonus, here's a procedure similare to Range.RemoveDuplicates to remove duplicates from a 2D array:
PowerShell is a very powerful and efficient tool. This is cheating a little, but shelling PowerShell via VBA opens up lots of options
The bulk of the code below is simply to save the current sheet as a csv file. The output is another csv file with just the unique values
Use Excel's AdvancedFilter function to do this.
Using Excels inbuilt C++ is the fastest way with smaller datasets, using the dictionary is faster for larger datasets. For example:
Copy values in Column A and insert the unique values in column B:
It works with multiple columns too:
Be careful with multiple columns as it doesn't always work as expected. In those cases I resort to removing duplicates which works by choosing a selection of columns to base uniqueness. Ref: MSDN - Find and remove duplicates
Here I remove duplicate columns based on the third column:
Here I remove duplicate columns based on the second and third column: