I have data which I regularly run regressions on. Each "chunk" of data gets fit a different regression. Each state, for example, might have a different function that explains the dependent value. This seems like a typical "split-apply-combine" type of problem so I'm using the plyr package. I can easily create a list of lm()
objects which works well. However I can't quite wrap my head around how I use those objects later to predict values in a separate data.frame.
Here's a totally contrived example illustrating what I'm trying to do:
# setting up some fake data
set.seed(1)
funct <- function(myState, myYear){
rnorm(1, 100, 500) + myState + (100 * myYear)
}
state <- 50:60
year <- 10:40
myData <- expand.grid( year, state)
names(myData) <- c("year","state")
myData$value <- apply(myData, 1, function(x) funct(x[2], x[1]))
## ok, done with the fake data generation.
require(plyr)
modelList <- dlply(myData, "state", function(x) lm(value ~ year, data=x))
## if you want to see the summaries of the lm() do this:
# lapply(modelList, summary)
state <- 50:60
year <- 50:60
newData <- expand.grid( year, state)
names(newData) <- c("year","state")
## now how do I predict the values for newData$value
# using the regressions in modelList?
So how do I use the lm()
objects contained in modelList
to predict values using the year and state independent values from newData
?
I take it the hard part is matching each state in
newData
to the corresponding model.Something like this perhaps?
Here I used a "hacky" way of extracting the corresponding state model:
as.character(min(x$state))
...There is probably a better way?
Output:
Or, if you want a
data.frame
as output:Output:
What is wrong with
?
EDIT:
Thanks for explaining what is wrong with that. How about:
Iterate over the models, and apply the new data (which is the same for each state since you just did an
expand.grid
to create it).EDIT 2:
If
newData
does not have the same values foryear
for everystate
as in the example, a more general approach can be used. Note that this uses the original definition ofnewData
, not the one in the first edit.First 15 lines of this output:
A solution with just
base
R. The format of the output is different, but all the values are right there.Here's my attempt:
JD Long edit
I was inspired enough to work out an
adply()
option:Maybe I'm missing something, but I believe
lmList
is the ideal tool here,You need to use
mdply
to supply both the model and the data to each function call: