I'm using a function smean.cl.normal
from Hmisc package that returns a vector with 3 values: the mean and the lower and upper CI. When I use it on a data.table
with 2 groups, I obtain 2 columns and 6 rows. Is there a way to obtain the result with two rows corresponding to 2 groups and separate columns for each of function's outputs, i.e. the mean and CIs?
require(Hmisc)
require(data.table)
dt = data.table(x = rnorm(100),
gr = rep(c('A', 'B'), each = 50))
dt[, lapply(.SD, smean.cl.normal), by = gr, .SDcols = "x"]
The output:
gr x
1: A -0.07916335
2: A -0.33656667
3: A 0.17823998
4: B -0.02745333
5: B -0.32950607
6: B 0.27459941
The desired output:
gr Mean Lower Upper
1: A -0.07916335 -0.33656667 0.17823998
2: B -0.02745333 -0.32950607 0.27459941
The j
argument in DT[i,j,by]
expects a list, so use as.list
:
dt[,
Reduce(c, lapply(.SD, function(x) as.list(smean.cl.normal(x))))
, by = gr, .SDcols = "x"]
# gr Mean Lower Upper
# 1: A 0.1032966 -0.1899466 0.3965398
# 2: B -0.1437617 -0.4261330 0.1386096
c(L1, L2, L3)
is how lists are combined, so Reduce(c, List_o_Lists)
does the trick in case your .SDcols
contains more than just x
. I guess do.call(c, List_o_Lists)
should also work.
Comments
This is quite inefficient for a couple of reasons. Turn on verbose=TRUE
to see that data.table doesn't like getting named lists in j
:
The result of j is a named list. It's very inefficient to create the same names over and over again for each group. When j=list(...), any names are detected, removed and put back after grouping has completed, for efficiency. Using j=transform(), for example, prevents that speedup (consider changing to :=). This message may be upgraded to warning in future.
Also, you are missing out on group-optimized versions of mean
and other functions that can probably be used to build your result. This may not be a big deal for your use-case, though.
When you're applying this to only a single value column, just:
dt[, as.list(smean.cl.normal(x)), by = gr]
suffices.