I'm having trouble figuring out how to effective map across multiple parameters and variables within a tbl to generate new variables.
In the "real" version, I basically have one mathematical function generating a central estimate, and I need to run a whole series of sensitivity tests varying different parameters. I'm trying to figure out how to do this within the tidyverse. It looks like map() and mutate() are the answers to this, but I'm having trouble.
# building the practice dataset
pracdf <- tibble(ID = letters,
p = runif(26, 100, 1000),
med.a = runif(26),
med.b = runif(26),
c = runif(26))
pracdf <- pracdf %>%
mutate(low.a = med.a * 0.8,
low.b = med.b * 0.8,
high.a = med.a * 1.2,
high.b = med.b * 1.2)
# this generates a few low/med/high values for variables
# the function
pracdf <- pracdf %>% mutate(d = p * med.a * med.b * c)
# works as expected. Now can I loop it with dynamic variable names?
f1 <- function(df, var.a) {
var.a <- enquo(var.a)
print(var.a)
d.name <- paste0("d.", quo_name(var.a))
print(d.name)
df %>% mutate(!!d.name := p * (!!var.a) * c)
}
pracdf2 <- f1(pracdf, med.a)
# works great! Eventually I want to loop through low, med, high. Start with a loop of 1
pracdf3 <- map(list(med.a), f1, df = pracdf)
# loop crashes spectacularly
pracdf3 <- map(list(med.a), ~f1, df = pracdf)
# failure
pracdf3 <- map(med.a, ~f1, df = pracdf)
# what am I doing with my life
I think one of the issues making this task difficult is the current set up might not be very "tidy". E.g. low.a
, low.b
, med.a
etc appear to be examples of what I understand to be 'untidy' columns.
Below is one possible approach (which I am fairly sure can probably be improved) which doesn't use a for loop or custom function at all. The key idea is to take the initial pracdf
and expand the existing rows so there is one row for each "level" (i.e., low, med, and high). Doing this lets us calculate d
in a single step with no for loops for low, med, and high.
(Edited for readability and to include Jens Leerssen's suggestions)
library(dplyr)
library(tidyr)
set.seed(123)
pracdf <- tibble(ID = letters,
p = runif(26, 100, 1000),
a = runif(26),
b = runif(26),
c = runif(26))
levdf <- tibble(level = c("low", "med", "high"),
level_val = c(0.8, 1.0, 1.2))
tidy_df <- pracdf %>% merge(levdf) %>%
mutate(d = p * (level_val * a) * (level_val * b) * c) %>%
select(-level_val) %>% arrange(ID) %>% as_tibble()
tidy_df
#> # A tibble: 78 x 7
#> ID p a b c level d
#> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
#> 1 a 358.8198 0.5440660 0.7989248 0.3517979 low 35.116168
#> 2 a 358.8198 0.5440660 0.7989248 0.3517979 med 54.869013
#> 3 a 358.8198 0.5440660 0.7989248 0.3517979 high 79.011379
#> 4 b 809.4746 0.5941420 0.1218993 0.1111354 low 4.169914
#> 5 b 809.4746 0.5941420 0.1218993 0.1111354 med 6.515490
#> 6 b 809.4746 0.5941420 0.1218993 0.1111354 high 9.382306
#> 7 c 468.0792 0.2891597 0.5609480 0.2436195 low 11.837821
#> 8 c 468.0792 0.2891597 0.5609480 0.2436195 med 18.496595
#> 9 c 468.0792 0.2891597 0.5609480 0.2436195 high 26.635096
#> 10 d 894.7157 0.1471136 0.2065314 0.6680556 low 11.622957
#> # ... with 68 more rows
However, the result above might not be the format you want the final data in. But we can take care of this by doing some gathering and spreading of tidy_df
using tidyr::gather
and tidyr::spread
.
tidy_df %>%
gather(variable, value, a, b, d) %>%
unite(level_variable, level, variable) %>%
spread(level_variable, value)
#> # A tibble: 26 x 12
#> ID p c high_a high_b high_d low_a
#> * <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 a 358.8198 0.3517979 0.54406602 0.79892485 79.011379 0.54406602
#> 2 b 809.4746 0.1111354 0.59414202 0.12189926 9.382306 0.59414202
#> 3 c 468.0792 0.2436195 0.28915974 0.56094798 26.635096 0.28915974
#> 4 d 894.7157 0.6680556 0.14711365 0.20653139 26.151654 0.14711365
#> 5 e 946.4206 0.4176468 0.96302423 0.12753165 69.905442 0.96302423
#> 6 f 141.0008 0.7881958 0.90229905 0.75330786 108.778072 0.90229905
#> 7 g 575.2949 0.1028646 0.69070528 0.89504536 52.681362 0.69070528
#> 8 h 903.1771 0.4348927 0.79546742 0.37446278 168.480110 0.79546742
#> 9 i 596.2915 0.9849570 0.02461368 0.66511519 13.845603 0.02461368
#> 10 j 510.9533 0.8930511 0.47779597 0.09484066 29.775361 0.47779597
#> # ... with 16 more rows, and 5 more variables: low_b <dbl>, low_d <dbl>,
#> # med_a <dbl>, med_b <dbl>, med_d <dbl>
Consider a vectorized approach (forgive me for non-tidyverse data wrangling) where all new columns can be handled in one call. Use seed(888)
before random data to reproduce output:
f1 <- function(df, vars) {
df[paste0("d.", vars)] <- df$p * df[vars] * df$c
return(df)
}
newpracdf <- f1(pracdf, c("low.a","high.a","med.a","med.b","low.b","high.b"))
Output
# # A tibble: 26 x 15
# ID p med.a med.b c low.a low.b high.a high.b d.low.a d.high.a d.med.a d.med.b d.low.b d.high.b
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 a 122.9573 0.65746601 0.43123587 0.81314570 0.52597281 0.3449887 0.7889592 0.5174830 52.587917 78.881876 65.734897 43.115909 34.492727 51.739091
# 2 b 412.0127 0.19793909 0.77148952 0.26039116 0.15835127 0.6171916 0.2375269 0.9257874 16.988630 25.482945 21.235787 82.768834 66.215068 99.322601
# 3 c 155.1248 0.30834064 0.99850558 0.57853922 0.24667251 0.7988045 0.3700088 1.1982067 22.137823 33.206735 27.672279 89.611689 71.689351 107.534027
# 4 d 715.3769 0.85517040 0.81715464 0.84196723 0.68413632 0.6537237 1.0262045 0.9805856 412.071636 618.107455 515.089546 492.191742 393.753393 590.630090
# 5 e 790.5284 0.12617255 0.59290522 0.54879020 0.10093804 0.4743242 0.1514071 0.7114863 43.790379 65.685568 54.737973 257.222588 205.778070 308.667105
# 6 f 193.6968 0.15173488 0.93054996 0.08587380 0.12138791 0.7444400 0.1820819 1.1166600 2.019104 3.028655 2.523879 15.478286 12.382629 18.573943
# 7 g 451.6000 0.88123996 0.62858787 0.12546384 0.70499197 0.5028703 1.0574880 0.7543054 39.944473 59.916709 49.930591 35.615457 28.492365 42.738548
# 8 h 342.3741 0.09952918 0.56932309 0.10980862 0.07962334 0.4554585 0.1194350 0.6831877 2.993489 4.490234 3.741861 21.404056 17.123245 25.684867
# 9 i 143.9489 0.42407685 0.94929822 0.02754267 0.33926148 0.7594386 0.5088922 1.1391579 1.345083 2.017624 1.681353 3.763718 3.010975 4.516462
# 10 j 911.8069 0.25822441 0.08934875 0.55244369 0.20657953 0.0714790 0.3098693 0.1072185 104.058645 156.087967 130.073306 45.006930 36.005544 54.008316
# # ... with 16 more rows