I'm having trouble figuring out how to effective map across multiple parameters and variables within a tbl to generate new variables.
In the "real" version, I basically have one mathematical function generating a central estimate, and I need to run a whole series of sensitivity tests varying different parameters. I'm trying to figure out how to do this within the tidyverse. It looks like map() and mutate() are the answers to this, but I'm having trouble.
# building the practice dataset
pracdf <- tibble(ID = letters,
p = runif(26, 100, 1000),
med.a = runif(26),
med.b = runif(26),
c = runif(26))
pracdf <- pracdf %>%
mutate(low.a = med.a * 0.8,
low.b = med.b * 0.8,
high.a = med.a * 1.2,
high.b = med.b * 1.2)
# this generates a few low/med/high values for variables
# the function
pracdf <- pracdf %>% mutate(d = p * med.a * med.b * c)
# works as expected. Now can I loop it with dynamic variable names?
f1 <- function(df, var.a) {
var.a <- enquo(var.a)
print(var.a)
d.name <- paste0("d.", quo_name(var.a))
print(d.name)
df %>% mutate(!!d.name := p * (!!var.a) * c)
}
pracdf2 <- f1(pracdf, med.a)
# works great! Eventually I want to loop through low, med, high. Start with a loop of 1
pracdf3 <- map(list(med.a), f1, df = pracdf)
# loop crashes spectacularly
pracdf3 <- map(list(med.a), ~f1, df = pracdf)
# failure
pracdf3 <- map(med.a, ~f1, df = pracdf)
# what am I doing with my life
I think one of the issues making this task difficult is the current set up might not be very "tidy". E.g.
low.a
,low.b
,med.a
etc appear to be examples of what I understand to be 'untidy' columns.Below is one possible approach (which I am fairly sure can probably be improved) which doesn't use a for loop or custom function at all. The key idea is to take the initial
pracdf
and expand the existing rows so there is one row for each "level" (i.e., low, med, and high). Doing this lets us calculated
in a single step with no for loops for low, med, and high.(Edited for readability and to include Jens Leerssen's suggestions)
However, the result above might not be the format you want the final data in. But we can take care of this by doing some gathering and spreading of
tidy_df
usingtidyr::gather
andtidyr::spread
.Consider a vectorized approach (forgive me for non-tidyverse data wrangling) where all new columns can be handled in one call. Use
seed(888)
before random data to reproduce output:Output