Yet another apply
question.
I've reviewed a lot of documentation on the apply
family of functions in R (and use them quite a bit in my work). I've defined a function myfun
below which I want to apply to every row of the dataframe inc
. I think I need some variant of apply(inc,1,myfun)
I've played around with it for a while, but still can't quite get it. I've included a loop which achieves exactly what I want to do... it's just super slow and inefficient on my real data which is considerably larger than the sample data I've included here.
I expect it's a quick fix, but I can't quite put my finger on it... maybe something with special argument ...
to apply?
English version of what the code below does: I want to look at all the Submit Dates in the inc
dataframe and find for each of these dates, how many rows in chg
there are where chg$Submit.Date
is within some range of the inc$Submit.Date
. Where the range is controlled by fdays
and bdays
in myfun
setting up some fake data
chgdf <- data.frame(Submit.Date=as.Date(c("2013-09-27", "2013-09-4", "2013-08-01", "2013-06-24", '2013-05-29', '2013-08-20')), ID=c('001', '001', '001', '001', '001', '005'), stringsAsFactors=F)
incdf <- data.frame(Submit.Date=as.Date(c("2013-10-19", "2013-09-14", "2013-08-22", '2013-08-20')), ID=c('001', '001', '002', '006'), stringsAsFactors=F)
the function i want to apply to every line of the data frame inc
myfun <- function(tdate, aid, chg=chgdf, inc=incdf, fdays=30, bdays=30) {
fdays <- tdate+fdays
bdays <- tdate-bdays
chg2 <- chg[chg$ID==aid & chg$Submit.Date<fdays & chg$Submit.Date>bdays, ]
ret <- nrow(chg2)
return(ret)
}
works for one line of inc dataframe
tdate <- inc[inc$ID==aid, 'Submit.Date'][1]
myfun(tdate, aid='001', bdays=50, fdays=100)
works but slow...with full dataset
inc$chgw <- 0
for(i in 1:nrow(inc)){
aid <- inc$ID[i]
tdate <- inc$Submit.Date[i]
inc$chgw[i] <- myfun(tdate, aid, bdays=50, fdays=100)
}