My aggregation needs vary among columns / data.frames. I would like to pass the "list" argument to the data.table dynamically.
As a minimal example:
require(data.table)
type <- c(rep("hello", 3), rep("bye", 3), rep("ok",3))
a <- (rep(1:3, 3))
b <- runif(9)
c <- runif(9)
df <- data.frame(cbind(type, a, b, c), stringsAsFactors=F)
DT <-data.table(df)
This call:
DT[, list(suma = sum(as.numeric(a)), meanb = mean(as.numeric(b)), minc = min(as.numeric(c))), by= type]
will have result similar to this:
type suma meanb minc
1: hello 6 0.1332210 0.4265579
2: bye 6 0.5680839 0.2993667
3: ok 6 0.5694532 0.2069026
Future data.frames will have more columns that I will want to summarize differently. But for the sake of working with this small example: Is there a way to pass the list programatically?
I naïvely tried:
# create a different list
mylist <- "list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))"
# new call
DT[, mylist, by=type]
With the following error:
1: hello
2: bye
3: ok
mylist
1: list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))
2: list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))
3: list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))
Any hints appreciated! Best regards!
PS sorry about these as.numeric()
, I could not quite figure out why, but I needed them for the example to run.
Minor edit inserted columns / before data.frame in initial sentence to clarify my needs.
This is explained FAQ 1.6 what you are looking for is
quote
andeval
something like
After a bit of head-banging, here is a very ugly way of constructing the call for ddply using
.()
You could also use
as.quoted
which has anas.quoted.character
method to construct usingpaste0
This can be used with data.table as well!
data.table
all the way.Another method (supporting the use of
paste
orpaste0
to build the expression):Another way is to use
.SDcols
to group the columns for which you'd like to perform the same operations together. Let's say that you require columnsa,d,e
to be summed bytype
where as,b,g
should havemean
taken andc,f
its median, then,If/when you have many many column, making a list like the one you've might be cumbersome.
I find it worrysome that apparently
eval
is part of the answer. From your question it is not clear to me, if and why you really want to do what you claim to want. Thus I demonstrate here that you can also use a function: