Working with R, this is a real WTF:
R> f_string <- 'Sepal.Length ~ Sepal.Width'
R> l <- with(iris, lm(as.formula(f_string))) # works fine
R> f_formula <- as.formula(f_string)
R> l <- with(iris, lm(f_formula))
Error in eval(expr, envir, enclos) : object 'Sepal.Length' not found
Why does as.formula have to be inside the lm()
call? I get it that this is a question about which environment things are evaluated in, because this works:
R> f_formula <- with(iris, as.formula(f_string))
R> lm(f_formula)
but I'm having real trouble wrapping my head around why one works and the other one doesn't.
Your failing example fails because you are creating the formula with the global environment:
> f_formula <- as.formula(f_string)
> l <- with(iris, lm(f_formula))
Error in eval(expr, envir, enclos) : object 'Sepal.Length' not found
> str(f_formula)
Class 'formula' length 3 Sepal.Length ~ Sepal.Width
..- attr(*, ".Environment")=<environment: R_GlobalEnv>
and there's no Sepal.Length
there. If you create the appropriate objects in the global environment you can make it work:
> Sepal.Length=1:10
> Sepal.Width=runif(10)
> l <- with(iris, lm(f_formula)) # "works" (ie doesn't error)
But that is completely ignoring the iris
data. Welcome to the world of annoying R behaviour.
The other examples are all computing the formula object within the iris
data frame as an environment. If you debug lm
and take a look at what formula
is in one of your working cases:
Browse[2]> str(formula)
Class 'formula' length 3 Sepal.Length ~ Sepal.Width
..- attr(*, ".Environment")=<environment: 0x9d590b4>
you'll see the environment is no longer the global one. If you want to see what's in that environment, get it from the formula's attributes and list:
Browse[2]> e = attr(formula,".Environment")
Browse[2]> with(e,ls())
[1] "Petal.Length" "Petal.Width" "Sepal.Length" "Sepal.Width" "Species"