I wonder if it is possible to use lm() within mutate() of dplyr package. Currently I have a dataframe of "date", "company", "return" and "market.ret" reproducible as below:
library(dplyr)
n.dates <- 60
n.stocks <- 2
date <- seq(as.Date("2011-07-01"), by=1, len=n.dates)
symbol <- replicate(n.stocks, paste0(sample(LETTERS, 5), collapse = ""))
x <- expand.grid(date, symbol)
x$return <- rnorm(n.dates*n.stocks, 0, sd = 0.05)
names(x) <- c("date", "company", "return")
x <- group_by(x, date)
x <- mutate(x, market.ret = mean(x$return, na.rm = TRUE))
Now for each company I would like to fit "return" by "market.ret", calculate the linear regression coefficient and store the slopes in a new column. I wish to do it with mutate(), but the below code does not work:
x <- group_by(x, company)
x <- mutate(x, beta = coef(lm(x$return~x$market.ret))[[2]])
The error reported by R is:
Error in terms.formula(formula, data = data) :
invalid term in model formula
Thanks in advance for any suggestion!
You seem to want to calculate a daily market return across all companies, and then regress return vs. market return for each company, across all days. If so, here's a solution using
data.table
; likely to be faster with very large datasets.where
x
is as shown below (usingset.seed
for reproducibility):This seems to work for me: