R's duplicated
returns a vector showing whether each element of a vector or data frame is a duplicate of an element with a smaller subscript. So if rows 3, 4, and 5 of a 5-row data frame are the same, duplicated
will give me the vector
FALSE, FALSE, FALSE, TRUE, TRUE
But in this case I actually want to get
FALSE, FALSE, TRUE, TRUE, TRUE
that is, I want to know whether a row is duplicated by a row with a larger subscript too.
I've had the same question, and if I'm not mistaken, this is also an answer.
Dunno which one is faster, though, the dataset I'm currently using isn't big enough to make tests which produce significant time gaps.
You need to assemble the set of
duplicated
values, applyunique
, and then test with%in%
. As always, a sample problem will make this process come alive.duplicated
has afromLast
argument. The "Example" section of?duplicated
shows you how to use it. Just callduplicated
twice, once withfromLast=FALSE
and once withfromLast=TRUE
and take the rows where either areTRUE
.Some late Edit: You didn't provide a reproducible example, so here's an illustration kindly contributed by @jbaums