There are many questions about rolling regression in R, but here I am specifically looking for something that uses dplyr
, broom
and (if needed) purrr
.
This is what makes this question different. I want to be tidyverse
consistent. Is is possible to do a proper running regression with tidy tools such as purrr:map
and dplyr
?
Please consider this simple example:
library(dplyr)
library(purrr)
library(broom)
library(zoo)
library(lubridate)
mydata = data_frame('group' = c('a','a', 'a','a','b', 'b', 'b', 'b'),
'y' = c(1,2,3,4,2,3,4,5),
'x' = c(2,4,6,8,6,9,12,15),
'date' = c(ymd('2016-06-01', '2016-06-02', '2016-06-03', '2016-06-04',
'2016-06-03', '2016-06-04', '2016-06-05','2016-06-06')))
group y x date
<chr> <dbl> <dbl> <date>
1 a 1.00 2.00 2016-06-01
2 a 2.00 4.00 2016-06-02
3 a 3.00 6.00 2016-06-03
4 a 4.00 8.00 2016-06-04
5 b 2.00 6.00 2016-06-03
6 b 3.00 9.00 2016-06-04
7 b 4.00 12.0 2016-06-05
8 b 5.00 15.0 2016-06-06
For each group (in this example, a
or b
):
- compute the rolling regression of
y
onx
over the last 2 observations. - store the coefficient of that rolling regression in a column of the dataframe.
Of course, as you can see, the rolling regression can only be computed for the last 2 rows in each group.
I have tried to use the following, but without success.
data %>% group_by(group) %>%
mutate(rolling_coef = do(tidy(rollapply(. ,
width=2,
FUN = function(df) {t = lm(formula=y ~ x,
data = as.data.frame(df),
na.rm=TRUE);
return(t$coef) },
by.column=FALSE, align="right"))))
Error in mutate_impl(.data, dots) :
Evaluation error: subscript out of bounds.
In addition: There were 21 warnings (use warnings() to see them)
Any ideas?
Expected output for the last two rows of the first a
group is 0.5 and 0.5 (there is indeed a perfect linear correlation between y
and x
in this example)
More specifically:
mydata_1 <- mydata %>% filter(group == 'a',
row_number() %in% c(1,2))
# A tibble: 2 x 3
group y x
<chr> <dbl> <dbl>
1 a 1.00 2.00
2 a 2.00 4.00
> tidy(lm(y ~ x, mydata_1))['estimate'][2,]
[1] 0.5
and also
mydata_2 <- mydata %>% filter(group == 'a',
row_number() %in% c(2,3))
# A tibble: 2 x 3
group y x
<chr> <dbl> <dbl>
1 a 2.00 4.00
2 a 3.00 6.00
> tidy(lm(y ~ x, mydata_2))['estimate'][2,]
[1] 0.5
EDIT:
interesting follow-up to this question here rolling regression with confidence interval (tidyverse)
This is more of an idea than an answer but maybe instead of using
group_by
try usingmap
and your list of groups:Define a function
Coef
whose argument is formed fromcbind(y, x)
and which regresses y on x with an intercept, returning the coefficients. Then applyrollapplyr
using the current and prior rows over each group. If by last you meant the 2 prior rows to the current row, i.e. exclude the current row, then replace 2 withlist(-seq(2))
as an argument torollapplyr
.giving:
Variation
A variation of the above would be:
Slope Only
If only the slope is needed there are further simplifications possible. We use the fact that the slope equals
cov(x, y) / var(x)
.Does this do what you're after?
Giving:
Edit: Slightly modified code, but
data_frame
will not accept the.
group placeholder as an argument- not sure how to fix that.Edit 2: Using
fill = NA
rather than usingc(NA, ...)
achieves the same result.Here is a solution similar to G. Grothendieck's answer but using the
rollRegres
package. I have to increase thewidth
argument to 3 to avoid an error (by the way, why do you want a regression with so few observations?)