R Dynamically build “list” in data.table (or ddply

My aggregation needs vary among columns / data.frames. I would like to pass the "list" argument to the data.table dynamically.

As a minimal example:

require(data.table)
type <- c(rep("hello", 3), rep("bye", 3), rep("ok",3))
a <- (rep(1:3, 3))
b <- runif(9)
c <- runif(9)
df <- data.frame(cbind(type, a, b, c), stringsAsFactors=F)
DT <-data.table(df)

This call:

DT[, list(suma = sum(as.numeric(a)), meanb = mean(as.numeric(b)), minc = min(as.numeric(c))), by= type]

will have result similar to this:

    type suma     meanb      minc
1: hello    6 0.1332210 0.4265579
2:   bye    6 0.5680839 0.2993667
3:    ok    6 0.5694532 0.2069026

Future data.frames will have more columns that I will want to summarize differently. But for the sake of working with this small example: Is there a way to pass the list programatically?

I naïvely tried:

# create a different list
mylist <- "list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))"
# new call
DT[, mylist, by=type]

With the following error:

1: hello
2:   bye
3:    ok
mylist
1: list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))
2: list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))
3: list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))

Any hints appreciated! Best regards!

PS sorry about these as.numeric(), I could not quite figure out why, but I needed them for the example to run.

Minor edit inserted columns / before data.frame in initial sentence to clarify my needs.

标签： r data.table plyr aggregation

4条回答

爷、活的狠高调

2楼-- · 2019-01-18 12:21

This is explained FAQ 1.6 what you are looking for is quote and eval

something like

 mycall <- quote(list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c))))

 DT[, eval(mycall)]

After a bit of head-banging, here is a very ugly way of constructing the call for ddply using .()

myplyrcall <- .(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))

do.call(ddply,c(.data = quote(DF), .variables = 'type',.fun = quote(summarise),myplyrcall))

You could also use as.quoted which has an as.quoted.character method to construct using paste0

myplc <-as.quoted(c("lengtha" = "length(as.numeric(a))", "maxb" = "max(as.numeric(b))", "meanc" = "mean(as.numeric(c))"))

This can be used with data.table as well!

dtcall <- as.quoted(mylist)[[1]]


DT[,eval(dtcall), by = type]

data.table all the way.

0人赞添加讨论(0) 举报

唯我独甜

3楼-- · 2019-01-18 12:21

Another method (supporting the use of paste or paste0 to build the expression):

expr <- parse(text=mylist)
DT[, eval( expr ), by=type]
#-------
    type lengtha      maxb     meanc
1: hello       3 0.8265407 0.5244094
2:   bye       3 0.4955301 0.6289475
3:    ok       3 0.9527455 0.5600915

0人赞添加讨论(0) 举报

干净又极端

4楼-- · 2019-01-18 12:27

Another way is to use .SDcols to group the columns for which you'd like to perform the same operations together. Let's say that you require columns a,d,e to be summed by type where as, b,g should have mean taken and c,f its median, then,

# constructing an example data.table:
set.seed(45)
dt <- data.table(type=rep(c("hello","bye","ok"), each=3), a=sample(9), 
                 b = rnorm(9), c=runif(9), d=sample(9), e=sample(9), 
                 f = runif(9), g=rnorm(9))

#     type a          b         c d e         f          g
# 1: hello 6 -2.5566166 0.7485015 9 6 0.5661358 -2.2066521
# 2: hello 3  1.1773119 0.6559926 3 3 0.4586280 -0.8376586
# 3: hello 2 -0.1015588 0.2164430 1 7 0.9299597  1.7216593
# 4:   bye 8 -0.2260640 0.3924327 8 2 0.1271187  0.4360063
# 5:   bye 7 -1.0720503 0.3256450 7 8 0.5774691  0.7571990
# 6:   bye 5 -0.7131021 0.4855804 6 9 0.2687791  1.5398858
# 7:    ok 1 -0.4680549 0.8476840 2 4 0.5633317  1.5393945
# 8:    ok 4  0.4183264 0.4402595 4 1 0.7592801  2.1829996
# 9:    ok 9 -1.4817436 0.5080116 5 5 0.2357030 -0.9953758

# 1) set key
setkey(dt, "type")

# 2) group col-ids by similar operations
id1 <- which(names(dt) %in% c("a", "d", "e"))
id2 <- which(names(dt) %in% c("b","g"))
id3 <- which(names(dt) %in% c("c","f"))

# 3) now use these ids in with .SDcols parameter
dt1 <- dt[, lapply(.SD, sum), by="type", .SDcols=id1]
dt2 <- dt[, lapply(.SD, mean), by="type", .SDcols=id2]
dt3 <- dt[, lapply(.SD, median), by="type", .SDcols=id3]

# 4) merge them.
dt1[dt2[dt3]]

#     type  a  d  e          b          g         c         f
# 1:   bye 20 21 19 -0.6704055  0.9110304 0.3924327 0.2687791
# 2: hello 11 13 16 -0.4936211 -0.4408838 0.6559926 0.5661358
# 3:    ok 14 11 10 -0.5104907  0.9090061 0.5080116 0.5633317

If/when you have many many column, making a list like the one you've might be cumbersome.

0人赞添加讨论(0) 举报

聊天终结者

5楼-- · 2019-01-18 12:34

I find it worrysome that apparently eval is part of the answer. From your question it is not clear to me, if and why you really want to do what you claim to want. Thus I demonstrate here that you can also use a function:

fun <- function(a,b,c) {
  list(lengtha = length(as.numeric(a)), 
          maxb = max(as.numeric(b)), 
         meanc = mean(as.numeric(c)))  
}

DT[, fun(a,b,c), by=type]

    type lengtha      maxb     meanc
1: hello       3 0.8792184 0.3745643
2:   bye       3 0.8718397 0.4519999
3:    ok       3 0.8900764 0.4511536

0人赞添加讨论(0) 举报

R Dynamically build “list” in data.table (or ddply

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间