Get rid of rows with duplicate attributes in R

2019-03-11 05:34发布

I have a big dataframe with columns such as:

ID, time, OS, IP

Each row of that dataframe corresponds to one entry. Within that dataframe for some IDs several entries (rows) exist. I would like to get rid of those multiple rows (obviously the other attributes will differ for the same ID). Or put different: I only want one single entry (row) for each ID.

When I use unique on the ID column, I only receive the levels (or each unique ID), but I want to keep the other attributes as well. I have tried to use apply(x,2,unique(data$ID)), but this does not work either.

标签： r duplicates dataframe

2条回答

放我归山

2楼-- · 2019-03-11 06:20

If you want to keep one row for each ID, but there is different data on each row, then you need to decide on some logic to discard the additional rows. For instance:

df <- data.frame(ID=c(1, 2, 2, 3), time=1:4, OS="Linux")
df
  ID time    OS
1  1    1 Linux
2  2    2 Linux
3  2    3 Linux
4  3    4 Linux

Now I will keep the maximum time value and the last OS value:

library(plyr)
unique(ddply(df, .(ID), function(x) data.frame(ID=x[,"ID"], time=max(x$time), OS=tail(x$OS,1))))
  ID time    OS
1  1    1 Linux
2  2    3 Linux
4  3    4 Linux

0人赞添加讨论(0) 举报

ら.Afraid

3楼-- · 2019-03-11 06:27

subset(data,!duplicated(data$ID))

Should do the trick

0人赞添加讨论(0) 举报

Get rid of rows with duplicate attributes in R

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间