I've got a data frame containing a vector of x values, a vector of y values, and a vector of IDs:
x <- rep(0:3, 3)
y <- runif(12)
ID <- c(rep("a", 4), rep("b", 4), rep("c", 4))
df <- data.frame(ID=ID, x=x, y=y)
I'd like to create a separate lm for the subset of x's and y's sharing the same ID. The following code gets the job done:
a.lm <- lm(x~y, data=subset(df, ID=="a"))
b.lm <- lm(x~y, data=subset(df, ID=="b"))
c.lm <- lm(x~y, data=subset(df, ID=="c"))
Except that this is very brittle (future data sets might have different IDs) and un-vectorized. I'd also like to store all the lms in a single data structure. There must be an elegant way to do this, but I can't find it. Any help?
Use some of the magic in the
plyr
package. The functiondlply
takes adata.frame
, splits it, applies a function to each element, and combines it into alist
. This is perfect for your application.This creates a list with a model for each subset of ID:
This means you can subset the list and work with that. For example, to get the coefficients for your
lm
model whereID=="a"
:How about
?
Using
base
functions, you cansplit
your original dataframe and uselapply
on that: