R: rollapplyr and lm factor error: Does rollapplyr

2019-06-06 15:09发布

This question builds upon a previous one which was nicely answered for me here.

R: Grouped rolling window linear regression with rollapply and ddply

Wouldn't you know that the code doesn't quite work when extended to the real data rather than the example data?

I have a somewhat large dataset with the following characteristics.

str(T0_satData_reduced)
'data.frame':   45537 obs. of  5 variables:
 $ date   : POSIXct, format: "2014-11-17 08:47:35" "2014-11-17 08:47:36" "2014-11-17 08:47:37" ...
 $ trial  : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ vial   : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
 $ O2sat  : num  95.1 95.1 95.1 95.1 95 95.1 95.1 95.2 95.1 95 ...
 $ elapsed: num  20 20 20.1 20.1 20.1 ...

The previous question dealt with the desire to apply a rolling regression of O2sat as a function of elapsed, but grouping the regressions by the factors trial and vial.

The following code is drawn from the answer to my previous question (simply modified for the complete dataset rather than the practice one)

rolled <- function(df) {
   rollapplyr(df, width = 600, function(m) { 
   coef(lm(formula = O2sat ~ elapsed, data = as.data.frame(m)))
   }, by = 60, by.column = FALSE)
 }

T0_slopes <- ddply(T0_satData_reduced, .(trial,vial), function(d) rolled(d))

However, when I run this code I get a series of errors or warnings (first two here).

Warning messages:
1: In model.response(mf, "numeric") :
using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : - not meaningful for factors

I'm not sure where this error comes from as I have shown both elapsed and O2sat are numeric, so I am not regressing on factors. However, if I force them both to be numeric within the rolled function above like this.

...
coef(lm(formula = as.numeric(O2sat) ~ as.numeric(elapsed), data = as.data.frame(m)))
...

I no longer get the errors, however, I don't know why this would solve the error. Additionally, the resulting regressions appear suspect because the intercept terms seem inappropriately small.

Any thoughts on why I am getting these errors and why using as.numeric seems to eliminate the errors (if potentially still providing inappropriate regression terms)?

Thank you

1条回答
Root(大扎)
2楼-- · 2019-06-06 15:36

rollapply passes a matrix to the function so only pass the numeric columns. Using rolled from my prior answer and the setup in that question:

do.call("rbind", by(dat[c("x", "y")], dat[c("w", "z")], rolled))

Added

Another way to do it is to perform the rollapply over the row indexes instead of over the data frame itself. In this example we have also added the conditioning variables as extra output columns:

rolli <- function(ix) {
   data.frame(coef = rollapplyr(ix, width = 6, function(ix) { 
         coef(lm(y ~ x, data = dat, subset = ix))[2]
      }, by = 3), w = dat$w[ix][1], z = dat$z[ix][1])
}
do.call("rbind", by(1:nrow(dat), dat[c("w", "z")], rolli))
查看更多
登录 后发表回答