Proper method to append to a formula where both fo

2019-07-21 04:59发布

问题:

I've done a fair amount of reading here on SO and learned that I should generally avoid manipulation of formula objects as strings, but I haven't quite found how to do this in a safe manner:

tf <- function(formula = NULL, data = NULL, groups = NULL, ...) {
# Arguments are unquoted and in the typical form for lm etc
# Do some plotting with lattice using formula & groups (works, not shown)
# Append 'groups' to 'formula':
# Change y ~ x as passed in argument 'formula' to
# y ~ x * gr where gr is the argument 'groups' with
# scoping so it will be understood by aov
new_formula <- y ~ x * gr
# Now do some anova (could do if formula were right)
model <- aov(formula = new_formula, data = data)
# And print the aov table on the plot (can do)
print(summary(model)) # this will do for testing
}

Perhaps the closest I came was to use reformulate but that only gives + on the RHS, not *. I want to use the function like this:

p <- tf(carat ~ color, groups = clarity, data = diamonds)

and have the aov results for carat ~ color * clarity. Thanks in Advance.

Solution

Here is a working version based on @Aaron's comment which demonstrates what's happening:

tf <- function(formula = NULL, data = NULL, groups = NULL, ...) {
print(deparse(substitute(groups)))
f <- paste(".~.*", deparse(substitute(groups)))
new_formula <- update.formula(formula, f)
print(new_formula)
model <- aov(formula = new_formula, data = data)
print(summary(model))
}

回答1:

I think update.formula can solve your problem, but I've had trouble with update within function calls. It will work as I've coded it below, but note that I'm passing the column to group, not the variable name. You then add that column to the function dataset, then update works.

I also don't know if it's doing exactly what you want in the second equation, but take a look at the help file for update.formula and mess around with it a bit.

http://stat.ethz.ch/R-manual/R-devel/library/stats/html/update.formula.html

tf <- function(formula,groups,d){
  d$groups=groups
  newForm = update(formula,~.*groups)
  mod = lm(newForm,data=d)
}

dat  = data.frame(carat=rnorm(10,0,1),color=rnorm(10,0,1),color2=rnorm(10,0,1),clarity=rnorm(10,0,1))
m = tf(carat~color,dat$clarity,d=dat)
m2 = tf(carat~color+color2,dat$clarity,d=dat)

tf2 <- function(formula, group, d) {
  f <- paste(".~.*", deparse(substitute(group)))
  newForm <- update.formula(formula, f)
  lm(newForm, data=d)
}
mA = tf2(carat~color,clarity,d=dat)
m2A = tf2(carat~color+color2,clarity,d=dat)

EDIT: As @Aaron pointed out, it's deparse and substitute that solve my problem: I've added tf2 as the better option to the code example so you can see how both work.



回答2:

One technique I use when I have trouble with scoping and calling functions within functions is to pass the parameters as strings and then construct the call within the function from those strings. Here's what that would look like here.

tf <- function(formula, data, groups) {
  f <- paste(".~.*", groups)
  m <- eval(call("aov", update.formula(as.formula(formula), f), data = as.name(data)))
  summary(m)
}

tf("mpg~vs", "mtcars", "am") 

See this answer to one of my previous questions for another example of this: https://stackoverflow.com/a/7668846/210673.

Also see this answer to the sister question of this one, where I suggest something similar for use with xyplot: https://stackoverflow.com/a/14858661/210673



标签: r formula