Finding ALL duplicate rows, including “elements wi

2018-12-31 02:13发布

R's duplicated returns a vector showing whether each element of a vector or data frame is a duplicate of an element with a smaller subscript. So if rows 3, 4, and 5 of a 5-row data frame are the same, duplicated will give me the vector

FALSE, FALSE, FALSE, TRUE, TRUE

But in this case I actually want to get

FALSE, FALSE, TRUE, TRUE, TRUE

that is, I want to know whether a row is duplicated by a row with a larger subscript too.

标签： r duplicates r-faq

3条回答

旧人旧事旧时光

2楼-- · 2018-12-31 02:43

I've had the same question, and if I'm not mistaken, this is also an answer.

vec[col %in% vec[duplicated(vec$col),]$col]

Dunno which one is faster, though, the dataset I'm currently using isn't big enough to make tests which produce significant time gaps.

0人赞添加讨论(0) 举报

旧人旧事旧时光

3楼-- · 2018-12-31 02:58

You need to assemble the set of duplicated values, apply unique, and then test with %in%. As always, a sample problem will make this process come alive.

> vec <- c("a", "b", "c","c","c")
> vec[ duplicated(vec)]
[1] "c" "c"
> unique(vec[ duplicated(vec)])
[1] "c"
>  vec %in% unique(vec[ duplicated(vec)]) 
[1] FALSE FALSE  TRUE  TRUE  TRUE

0人赞添加讨论(0) 举报

回忆，回不去的记忆

4楼-- · 2018-12-31 03:01

duplicated has a fromLast argument. The "Example" section of ?duplicated shows you how to use it. Just call duplicated twice, once with fromLast=FALSE and once with fromLast=TRUE and take the rows where either are TRUE.

Some late Edit: You didn't provide a reproducible example, so here's an illustration kindly contributed by @jbaums

vec <- c("a", "b", "c","c","c") 
vec[duplicated(vec) | duplicated(vec, fromLast=TRUE)]
## [1] "c" "c" "c"

0人赞添加讨论(0) 举报

Finding ALL duplicate rows, including “elements wi

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间