Yet another apply
question.
I've reviewed a lot of documentation on the apply
family of functions in R (and use them quite a bit in my work). I've defined a function myfun
below which I want to apply to every row of the dataframe inc
. I think I need some variant of apply(inc,1,myfun)
I've played around with it for a while, but still can't quite get it. I've included a loop which achieves exactly what I want to do... it's just super slow and inefficient on my real data which is considerably larger than the sample data I've included here.
I expect it's a quick fix, but I can't quite put my finger on it... maybe something with special argument ...
to apply?
English version of what the code below does: I want to look at all the Submit Dates in the inc
dataframe and find for each of these dates, how many rows in chg
there are where chg$Submit.Date
is within some range of the inc$Submit.Date
. Where the range is controlled by fdays
and bdays
in myfun
setting up some fake data
chgdf <- data.frame(Submit.Date=as.Date(c("2013-09-27", "2013-09-4", "2013-08-01", "2013-06-24", '2013-05-29', '2013-08-20')), ID=c('001', '001', '001', '001', '001', '005'), stringsAsFactors=F)
incdf <- data.frame(Submit.Date=as.Date(c("2013-10-19", "2013-09-14", "2013-08-22", '2013-08-20')), ID=c('001', '001', '002', '006'), stringsAsFactors=F)
the function i want to apply to every line of the data frame inc
myfun <- function(tdate, aid, chg=chgdf, inc=incdf, fdays=30, bdays=30) {
fdays <- tdate+fdays
bdays <- tdate-bdays
chg2 <- chg[chg$ID==aid & chg$Submit.Date<fdays & chg$Submit.Date>bdays, ]
ret <- nrow(chg2)
return(ret)
}
works for one line of inc dataframe
tdate <- inc[inc$ID==aid, 'Submit.Date'][1]
myfun(tdate, aid='001', bdays=50, fdays=100)
works but slow...with full dataset
inc$chgw <- 0
for(i in 1:nrow(inc)){
aid <- inc$ID[i]
tdate <- inc$Submit.Date[i]
inc$chgw[i] <- myfun(tdate, aid, bdays=50, fdays=100)
}
First, when you call
apply
all values are coerced to strings, so you need to converttdate
before using it. Otherwise you're trying to add days to a string:Second, you call
apply(inc, 1, myfun)
. Note that in that case you're passing a single parameter tomyfun
(the whole row), and not several parameters asmyfun
is supposed to receive.Solution 1: Change your function to receive a whole row of the dataframe and call as you did:
Solution 2: Call
apply
using all parameters in the function call:I personally prefer the second solution, because it gives you the possibility to change the default values of your other parameters in
myfun
:Similar to Julian's answer:
Here I don't use
apply
becauseapply
will coerce the whole row to the same type, which may not be desirable. Note we need tounname(x)
because your df doesn't have the same column names as args to your function.