Cumulative number of unique values in a column up

2020-02-07 04:00发布

I have a data frame, donorInfo, with donor information:

id        giftdate     giftamt
002       2001-01-05     25.00
033       2001-05-08     50.00
054       2001-09-22    125.00
125       2001-11-05     40.00
042       2001-12-04     75.00
...           ...         ...

I'd like to create a column that shows the cumulative number of unique donor id's up to that date. I think it's something like:

donorInfo$numUnique <- apply/lapply (donorInfo, 1, FUN=nrow(unique(donorInfo$id)))

unfortunately this isn't working and I'm wondering how to remedy things. Thanks for any suggestions.

标签： r apply

2条回答

可以哭但决不认输i

2楼-- · 2020-02-07 04:41

You can do this with duplicated() and cumsum() (taking advantage of the fact that Boolean-valued logical vectors can be coerced to numeric vectors):

# Example data.frame with some duplicated ids
df <- read.table(text="
id   giftdate giftamt
 2 2001-01-05      25
33 2001-05-08      50
 2 2001-09-22     125
33 2001-11-05      40
42 2001-12-04      75", header=T)

cumsum(!duplicated(df$id))
# [1] 1 2 2 2 3

0人赞添加讨论(0) 举报

We Are One

3楼-- · 2020-02-07 04:46

try something like this:

donorInfo$numUnique<-sapply(seq(nrow(donorInfo)), function(rn){
  length(unique(donorInfo$id[seq(rn)]))
})

Not the most efficient solution no doubt, but it should work.

0人赞添加讨论(0) 举报

Cumulative number of unique values in a column up

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间