R remove duplicate rows [duplicate]

2019-09-21 03:57发布

This question already has an answer here:

Finding ALL duplicate rows, including “elements with smaller subscripts” 5 answers

I have a dataframe where I would like to remove all rows with duplicates. For instance my dataframe looks like:

> df <- data.frame(A = c("Happy", "Happy", "Sad", "Confused", "Mad", "Mad"), B = c(1, 2, 3, 4, 5, 6))
> df
         A B
1    Happy 1
2    Happy 2
3      Sad 3
4 Confused 4
5      Mad 5
6      Mad 6

I only want rows where the entries in A are unique to get:

         A B
1      Sad 3
2 Confused 4

标签： r unique

2条回答

疯言疯语

2楼-- · 2019-09-21 04:32

You can try duplicated

df[!(duplicated(df$A)|duplicated(df$A,fromLast=TRUE)),]
#         A B
#3      Sad 3
#4 Confused 4

df[df$A %in% with(as.data.frame(table(df$A)), Var1[Freq==1]),]
#       A B
#3      Sad 3
#4 Confused 4

df[colSums(sapply(df$A, `==`, df$A))==1,]
#         A B
#3      Sad 3
#4 Confused 4

df[colSums(Vectorize(function(x) x==df$A)(df$A))==1,]

or using data.table (similar to @beginneR's use of ave)

library(data.table)
setDT(df)[,.SD[.N==1], by=A] 
#          A B
#1:      Sad 3
#2: Confused 4

 setDT(df)[df[,.I[.N==1], by=A]$V1]
 #          A B
 #1:      Sad 3
 #2: Confused 4

0人赞添加讨论(0) 举报

何必那么认真

3楼-- · 2019-09-21 04:51

akrun seems to be collecting different methods, so here's another one in base:

df[ave(as.numeric(df$A), df$A, FUN = length) == 1,]
#         A B
#3      Sad 3
#4 Confused 4

(I guess the one with duplicated would be the most commonly used method)

Or using dplyr:

require(dplyr)
group_by(df, A) %>% filter(n() == 1)

0人赞添加讨论(0) 举报

R remove duplicate rows [duplicate]

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间