I have a data frame with 1000 observations on 20 variables.
I want to select only the rows that have a unique combination across columns, regardless of their order.
That is, if a combination is ABA
and another is BAA
, I want the code only to return one of these combinations.
To identify unique combinations I run a simple unique
command across multiple variables.
How would you write such a code?
We can
sort
the data by row usingapply
withMARGIN=1
, then useduplicated
to return the logical index, negate it and get theunique
rows in the data.