I have a simple dataframe like this:
| id1 | id2 | location | comment |
|-----|-----|------------|-----------|
| 1 | 2 | Alaska | cold |
| 2 | 1 | Alaska | freezing! |
| 3 | 4 | California | nice |
| 4 | 5 | Kansas | boring |
| 9 | 10 | Alaska | cold |
The first two rows are duplicates because id1
and id2
both went to Alaska. It doesn't matter that their comment are different.
How can I remove one of these duplicates -- either one would be fine to remove.
I was first trying to sort id1
and id2
, then get the index where they are duplicated, then go back and use the index to subset the original df. But I can't seem to pull this off.
df <- data.frame(id1 = c(1,2,3,4,9), id2 = c(2,1,4,5,10), location=c('Alaska', 'Alaska', 'California', 'Kansas', 'Alaska'), comment=c('cold', 'freezing!', 'nice', 'boring', 'cold'))
We can use
apply
withMARGIN=1
tosort
by row for the 'id' columns, cbind with 'location' and then useduplicated
to get a logical index that can be used for removing/keeping the rows.