I have a formula that contains some terms and a data frame (the output of an earlier model.frame()
call) that contains all of those terms and some more. I want the subset of the model frame that contains only the variables that appear in the formula.
ff <- log(Reaction) ~ log(1+Days) + x + y
fr <- data.frame(`log(Reaction)`=1:4,
`log(1+Days)`=1:4,
x=1:4,
y=1:4,
z=1:4,
check.names=FALSE)
The desired result is fr
minus the z
column (fr[,1:4]
is cheating -- I need a programmatic solution ...)
Some strategies that don't work:
fr[all.vars(ff)]
## Error in `[.data.frame`(fr, all.vars(ff)) : undefined columns selected
(because all.vars()
gets "Reaction"
, not log("Reaction")
)
stripwhite <- function(x) gsub("(^ +| +$)","",x)
vars <- stripwhite(unlist(strsplit(as.character(ff)[-1],"\\+")))
fr[vars]
## Error in `[.data.frame`(fr, vars) : undefined columns selected
(because splitting on +
spuriously splits the log(1+Days)
term).
I've been thinking about walking down the parse tree of the formula:
ff[[3]] ## log(1 + Days) + x + y
ff[[3]][[1]] ## `+`
ff[[3]][[2]] ## log(1 + Days) + x
but I haven't got a solution put together, and it seems like I'm going down a rabbit hole. Ideas?
It looks to me like the only problem is the lack of a space in the name of the second column of fr. Rename it with a space and pull the columns in this way:
If you believe the only difference between the two will always be that the names of
fr
has spaces where the names inff
don't, then the above solution holds. I likelabels(terms(x))
a bit more, though, because it seems a bit more abstract.This should work:
And props to Roman Luštrik for pointing me in the right direction.
Edit: Looks like you could pull it out off the "variables" attribute as well:
Edit 2: Found first problem case, involving
I()
oroffset()
:Those would be pretty easy to correct with regex, though. BUT, if you had situations like in the question where a variable is called, e.g.,
log(x)
and is used in a formula alongside something likeI(log(y))
for variabley
, this will get really messy.