I have something along the lines of
y ~ x + z
And I would like to transform it to
y ~ x_part1 + x_part2 + z
More generally, I would like to have a function that takes a formula and returns that formula with all terms that match "^x$" replaced by "x_part1" and "x_part2". Here's my current solution, but it just feels so kludgey...
my.formula <- fruit ~ apple + banana
var.to.replace <- 'apple'
my.terms <- labels(terms(my.formula))
new.terms <- paste0('(',
paste0(var.to.replace,
c('_part1', '_part2'),
collapse = '+'),
')')
new.formula <- reformulate(termlabels = gsub(pattern = var.to.replace,
replacement = new.terms,
x = my.terms),
response = my.formula[[2]])
An additional caveat is that the input formula may be specified with interactions.
y ~ b*x + z
should output one of these (equivalent) formulae
y ~ b*(x_part1 + x_part2) + z
y ~ b + (x_part1 + x_part2) + b:(x_part1 + x_part2) + z
y ~ b + x_part1 + x_part2 + b:x_part1 + b:x_part2 + z
MrFlick has advocated the use of
substitute(y ~ b*x + z, list(x=quote(x_part1 + x_part2)))
but when I have stored the formula I want to modify in a variable, as in
my.formula <- fruit ~ x + banana
This approach seems to require a little more massaging:
substitute(my.formula, list(x=quote(apple_part1 + apple_part2)))
# my.formula
The necessary change to that approach was:
do.call(what = 'substitute',
args = list(apple, list(x=quote(x_part1 + x_part2))))
But I can't figure out how to use this approach when both 'x' and c('x_part', 'x_part2') are stored in variables with names, e.g. var.to.replace
and new.terms
above.
If you just want to modify main effects, you can subtract x, and add in the two new variables.
You can use the
substitute
function for thisHere we use the named list to tell R to replace the variable
x
with the expressionx_part1 + x_part2
How about working with the formula as a string? Many base R models like
lm()
accept a string formulas (and you can always useformula()
otherwise). In this case, you can use something likegsub()
:For example, with
mtcars
data set, and say we want to replacempg
(x) withdisp + hp
(x_part1 + x_part2):You can write a recursive function to modify the expression tree of the formula:
Which you can use to modify eg interactions: