Cumulative number of unique values in a column up

2020-02-07 04:00发布

I have a data frame, donorInfo, with donor information:

id        giftdate     giftamt
002       2001-01-05     25.00
033       2001-05-08     50.00
054       2001-09-22    125.00
125       2001-11-05     40.00
042       2001-12-04     75.00
...           ...         ...

I'd like to create a column that shows the cumulative number of unique donor id's up to that date. I think it's something like:

donorInfo$numUnique <- apply/lapply (donorInfo, 1, FUN=nrow(unique(donorInfo$id)))

unfortunately this isn't working and I'm wondering how to remedy things. Thanks for any suggestions.

标签: r apply
2条回答
可以哭但决不认输i
2楼-- · 2020-02-07 04:41

You can do this with duplicated() and cumsum() (taking advantage of the fact that Boolean-valued logical vectors can be coerced to numeric vectors):

# Example data.frame with some duplicated ids
df <- read.table(text="
id   giftdate giftamt
 2 2001-01-05      25
33 2001-05-08      50
 2 2001-09-22     125
33 2001-11-05      40
42 2001-12-04      75", header=T)

cumsum(!duplicated(df$id))
# [1] 1 2 2 2 3
查看更多
We Are One
3楼-- · 2020-02-07 04:46

try something like this:

donorInfo$numUnique<-sapply(seq(nrow(donorInfo)), function(rn){
  length(unique(donorInfo$id[seq(rn)]))
})

Not the most efficient solution no doubt, but it should work.

查看更多
登录 后发表回答