Passing a character vector as arguments to a funct

2019-04-02 12:37发布

问题:

I suspect I'm Doing It Wrong, but I'd like to pass a character vector as an argument to a function in ddply. There's a lot of Q&A on removing quotes, etc. but none of it seems to work for me (eg. Remove quotes from a character vector in R and http://r.789695.n4.nabble.com/Pass-character-vector-to-function-argument-td3045226.html).

# reproducible data
df1<-data.frame(a=sample(1:50,10),b=sample(1:50,10),c=sample(1:50,10),d=(c("a","b","c","a","a","b","b","a","c","d")))
df2<-data.frame(a=sample(1:50,9),b=sample(1:50,9),c=sample(1:50,9),d=(c("e","f","g","e","e","f","f","e","g")))
df3<-data.frame(a=sample(1:50,8),b=sample(1:50,8),c=sample(1:50,8),d=(c("h","i","j","h","h","i","i","h")))

#make a list
list.1<-list(df1=df1,df2=df2,df3=df3)

# desired output
lapply(list.1, function(x)   ddply(x, .(d), function(x)  data.frame(am=mean(x$a), bm=mean(x$b), cm=mean(x$c))))

$df1
  d       am       bm       cm
1 a 31.00000 29.25000 18.50000
2 b 31.66667 24.33333 34.66667
3 c 18.50000  5.50000 24.50000
4 d 36.00000 39.00000 43.00000

$df2
  d       am       bm cm
1 e 18.25000 32.50000 18
2 f 27.66667 41.33333 24
3 g 25.00000  7.50000 42

$df3
  d       am       bm       cm
1 h 36.00000 25.00000 20.50000
2 i 25.33333 37.33333 24.33333
3 j 32.00000 32.00000 46.00000

But my actual use-case has many new columns and different types of calculations that I want to calculate in the ddply function. So I want to do something like:

# here's a simple version of a function that I want to send to ddply    
func <- "am=mean(x$a), bm=mean(x$b), cm=mean(x$c)"

# here's how I imagine it might work
lapply(list.1, function(x)   ddply(x, .(d), function(x)  data.frame(func)) )

# not the desired outcome... 
$df1
  d                                     func
1 a am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
2 b am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
3 c am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
4 d am=mean(x$a), bm=mean(x$b), cm=mean(x$c)

$df2
  d                                     func
1 e am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
2 f am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
3 g am=mean(x$a), bm=mean(x$b), cm=mean(x$c)

$df3
  d                                     func
1 h am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
2 i am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
3 j am=mean(x$a), bm=mean(x$b), cm=mean(x$c)

I've tried noquote, deparse, eval(as.symbol()), do.call(data.frame, ...) and some of the methods here: https://github.com/hadley/devtools/wiki/Evaluation on func to no avail. The solution might be obvious at this point (ie. melt everything!), but in case it's not, here's a longer example that's closer to my use case:

# sample data
s <- 23 # number of samples
r <- 10 # number of runs per sample
el <- 17 # number of elements
mydata <- data.frame(ID = unlist(lapply(LETTERS[1:s], function(x) rep(x, r))),
                     run = rep(1:r, s))
# insert fake element data
mydata[letters[1:el]] <- lapply(1:el, function(i) rnorm(s*r, runif(1)*i^2))

# generate all combinations of 5 runs from  ten runs
su <- 5 # number of runs to sample from ten runs
idx <- combn(unique(mydata$run), su)

# RSE function
RSE <- function(x) {100*( (sd(x)/sqrt(length(x)))/mean(x) )}

# make a list of dfs for all samples for each combination of five runs
# to prepare to calculate RSEs
combys1 <- lapply(1:ncol(idx), function(i) mydata[mydata$run %in% idx[,i],] )

# make a list of dfs with RSE for each ID, for each combination of runs
combys2 <- lapply(1:length(combys1), function(i) ddply(combys1[[i]], "ID", summarise, RSEa=RSE(a), RSEb=RSE(b), RSEc=RSE(c), meana=mean(a), meanb=mean(b), meanc=mean(c)))

I want to replace RSEa=RSE(a), RSEb=RSE(b), RSEc=RSE(c), meana=mean(a), meanb=mean(b), meanc=mean(c) in the last line above with the object doRSE from here, to avoid lots of typing:

# prepare to calculate new colums with RSE and means
RSEs <- sapply(3:ncol(mydata), function(j) paste0("RSE",names(mydata[j]))) 
RSExs <- sapply(3:ncol(mydata), function(j) paste0("RSE(",names(mydata[j]),")")) 
doRSE <- paste0(sapply(1:length(RSEs), function(x) paste0(RSEs[x],"=",RSExs[x])), collapse=",", sep="")

I'm open to solutions involving base, data.table and dirty tricks. Seems like these are close to what I want, but I can't quite translate them to my problem: Pass character argument and evaluate, Force evaluation of multiple variables using vector of character, Using a vector of characters that correspond to an expression as an argument to a function

UPDATE Here's the catch: I want to be able to modify the func in the simple example (or doRSE in my use-case) to create a bunch of new columns that result from various calculations on the existing columns to explore the data. I want a workflow that allows the resulting dataframes to have new columns that were not in the original dataframes. Sorry that wasn't more clear in the original question. I can't see how to adapt @Marius' answer to do this, but @mnel's is helpful (see update below)

Working through @mnel's excellent dirty tricks, with some minor fixes I can get the desired result on my use-case:

# @mnel's solution, adapted (no period before eval)
combys2 <- lapply(combys1, function(x) do.call(ddply,c(.data = quote(x), 
                           .variables = quote(.(ID)), .fun = quote(summarize),
                           eval(parse(text = sprintf('.(%s)', doRSE ))))))
head(combys2)

[[1]]
   ID       RSEa      RSEb     RSEc      RSEd     RSEe      RSEf     RSEg      RSEh      RSEi
1   A  168.30658  21.68632 5.657228  5.048057 4.162017 2.9581874 1.849009 0.6925148 0.4393491
2   B   26.55071  26.20427 4.782578  4.385409 2.342764 2.1813874 2.719625 1.1576681 0.6427935
3   C   73.83165  14.47216 8.154435  6.273202 3.046978 1.2179457 2.811405 1.1401837 0.8167067
4   D   31.96170  57.89260 9.438220  7.388410 3.755772 0.8601780 3.724875 0.8358204 0.9939387
5   E   63.22537  60.35532 5.839690 11.691304 3.828430 0.9217787 4.204300 0.8217187 0.7876634
6   F   56.37635  65.37907 4.149568  5.496308 2.227544 2.1548455 2.847291 1.1956212 0.2506518
7   G   69.32232  23.63214 4.255847  7.979225 4.917660 1.6185960 3.156521 0.3265555 0.8133279
8   H   29.82015  40.74184 7.372100  7.464792 2.749862 0.6054420 4.061368 0.9973909 1.3807720
9   I   50.58114  19.53732 2.989920  9.767678 4.000249 1.7451322 1.175397 0.9952093 0.9095086
10  J   92.96462  39.77475 6.140688 10.295668 3.407726 2.4663758 3.030444 0.5743419 0.9296482
11  K   90.72381  42.25092 2.483069  6.781054 3.142082 1.8080633 2.891740 1.1996176 0.8525290
12  L -385.24547  40.81267 4.506087  8.148382 2.976488 0.8304432 2.234134 0.2108664 0.4979777
13  M   22.77743  33.98332 2.913926  8.764639 2.307293 0.8366635 3.229944 1.0003125 0.3878567
14  N   66.75163  34.16087 6.611326 13.865377 1.285522 1.3863958 4.165575 0.7379386 0.4515194
15  O   37.37188 100.57479 5.738877  5.724862 2.839638 1.1366610 3.186332 0.7383855 0.3954544
16  P   17.08913  26.62210 6.060130  4.110893 2.688908 2.6970727 1.609043 1.3860834 0.8780010
17  Q   13.96392  74.92279 5.469304  8.467638 2.974131 1.2135436 3.284564 0.6232778 1.0759226
18  R   42.59899  30.75952 4.842832  8.764158 1.874020 1.5791048 3.427342 1.4479638 0.2964455
19  S   26.03307  15.56352 6.968717  7.783876 4.439733 2.0764179 4.683080 0.7459654 1.1268772
20  T   71.57945  33.81362 7.147049 11.201551 2.128315 2.2051611 2.419805 0.2688807 1.1559635
21  U   73.93002  11.77155 7.738910  7.207041 1.478491 1.4409844 4.042419 0.5883490 0.5585716
22  V   67.93166  39.54994 5.701551  8.636122 2.472963 1.6514199 2.627965 1.0359048 0.8747136
23  W   11.23057  12.51272 7.003448  7.424559 4.102693 0.6614847 2.246305 1.3422405 0.2665246
        RSEj      RSEk      RSEl      RSEm      RSEn      RSEo      RSEp      RSEq
1  0.6366733 0.3713819 2.1993487 0.3865293 0.5436581 0.9187585 0.4344699 0.8915868
2  0.3445095 0.2932025 1.8563179 0.5397595 1.0433388 0.3533622 0.1942316 0.1941072
3  0.2720344 0.5507595 2.0305726 0.4377259 0.8589854 0.5690906 0.1397337 0.4043247
4  0.6606667 0.6769112 3.4737352 0.5674656 1.2519256 0.8718298 0.1162969 0.8287504
5  0.4620774 0.5598069 1.9236112 0.7990046 0.9832732 0.6847352 0.4070675 0.9005185
6  0.7981610 0.4005493 0.9721068 0.2770989 1.7054674 0.3110139 0.4521183 0.8740444
7  0.3969116 0.4717575 4.1341106 0.7510628 0.9998299 0.5342292 0.4319642 1.1861705
8  0.2963956 0.2652221 0.4775827 0.2617120 0.8261874 0.5266087 0.1900943 0.2350553
9  0.2609359 0.5431035 2.6478440 0.1606919 0.7407281 0.6802262 0.1802069 0.7438792
10 0.4239787 0.8753544 3.4218030 0.5467869 0.7404017 0.5581173 0.3682014 0.6361436
11 0.4188502 0.8629862 4.4181479 0.1623873 0.8018811 0.5873609 0.3592134 0.5357984
12 0.5790265 0.5009210 3.7534287 0.1933726 0.5809601 0.5777868 0.3400925 0.4783890
13 0.3562582 0.2552756 2.1393219 0.1849345 0.5796194 0.6129469 0.3363311 0.4382125
14 0.7921502 0.6147990 2.9054634 0.5852325 1.4954072 0.9983203 0.2937837 0.7654504
15 0.5840424 0.2757707 1.5695675 0.3305385 0.8712636 0.5816490 0.1985457 0.7213289
16 0.3301280 0.3008273 2.9014987 0.4540833 0.5966479 0.9042004 0.1631630 0.7262141
17 0.5882511 0.2820978 3.0652666 0.4518936 1.3168151 0.4749311 0.2244693 0.6583083
18 0.4048816 0.3708787 3.2207478 0.2603412 1.3168318 0.3318745 0.3120436 0.6210711
19 0.4425123 0.3602076 3.7609863 0.5399527 0.8302572 0.3246904 0.1952143 0.2915325
20 0.5877835 0.6339015 1.6908570 0.3223056 0.5239339 0.6607198 0.2808094 0.3697380
21 0.4454056 0.7733354 4.3433420 0.4391075 0.5503594 0.5893406 0.2262403 0.2361512
22 0.9583940 0.6365843 3.0033951 0.6507968 0.8610046 0.6363198 0.2866719 0.5736855
23 0.4969730 0.3895182 2.0021608 0.3354475 1.4398250 0.7386870 0.2458906 0.3414804
...
...

回答1:

You can do some ugly computing on the language using quote and plyr::.

Reading https://github.com/hadley/devtools/wiki/Computing-on-the-language will probably help understand whether you really want to do this.

Anyway, an approach could be to use

  1. use .() to create your vector of arguments eg and use how summarize works

    .(am=mean(a), bm=mean(b), cm=mean(c))
    

    and if you really wanted to use a character string

    foo<- "am=mean(a), bm=mean(b), cm=mean(c)"
    eval(parse(text = sprintf('.(%s)', foo )))
    
  2. Use quote liberally to create your list to be passed to to do.call

for example

lapply(list.1, function(x) do.call(ddply,c(.data = quote(x), 
    .variables = quote(.(d)), .fun = quote(summarize),
      .(am=mean(a), bm=mean(b), cm=mean(c)))))

Oh boy is that ugly.

Or, you could use data.tables

library(data.table)


listDT <- lapply(list.1, data.table)


lapply(listDT, function(x) x[,lapply(.SD, mean), by = 'd'])

or

mystuff <- sprintf('list(%s)', foo)
lapply(listDT, function(x) x[, eval(parse(text = mystuff)), by = 'd'])

However, if you had all the same columns in all your data.tables, it would be more efficient to create one large data.table (with an identifer for each element of the list) and work on that.



回答2:

Here's a ddply function that calculates the mean for all the columns that aren't d in your dataframes:

lapply(list.1,
       function(x) {
         ddply(
           x,
           .(d),
           function(df_part) {
             result_df <- data.frame(d=df_part$d[1])
             non_d_cols <- colnames(df_part)[! colnames(df_part) == "d"]
             for (col in non_d_cols) {
               col_mean <- mean(df_part[[col]])
               col_name <- paste0(col, "_mean")
               result_df[[col_name]] <- col_mean
             }
             return(result_df)
           })
       })

That seems to me like the simplest way to do it, and it should generalize well to other calculations you might want to do on those columns. Maybe you could pass in a character vector argument of the columns you want to calculate the mean for, and use that in place of non_d_cols.