Suppose I have a data.table like this:
Table:
V1 V2
A B
C D
C A
B A
D C
I want each row to be regarded as a set, which means that B A and A B are the same. So after the process, I want to get:
V1 V2
A B
C D
C A
In order to do that, I have to first sort the table row-by-row and then use unique
to remove the duplicates. The sorting process is quite slow if I have millions of rows. So is there an easy way to remove the duplicates without sorting?
Borrowing (probably unrealistic) data from a dupe:
Here's a fast way if your data looks like this:
Data like this seems unlikely if it pertains to covariances, since you should have at most one duplicate (ie, A-B with B-A).
Here is the simple way of removing duplicate rows.
The result is, Before,
After,
For just two columns you can use the following trick: