可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a dataset that I need to both split by one variable (Day) and then compare between groups of another variable (Group), performing per-group statistics (e.g. mean) and also tests.

Here's an example of what I devised:

require(data.table)
data = data.table(Day = rep(1:10, each = 10), 
                  Group = rep(1:2, times = 50),
                  V = rnorm(100))

data[, .(g1_mean = mean(.SD[Group == 1]$V),
         g2_mean = mean(.SD[Group == 2]$V),
         p.value = t.test(V ~ Group, .SD, alternative = "two.sided")$p.value),
     by = list(Day)]

Which produces:

    Day      g1_mean     g2_mean   p.value
 1:   1  0.883406048  0.67177271 0.6674138
 2:   2  0.007544956 -0.55609722 0.3948459
 3:   3  0.409248637  0.28717183 0.8753213
 4:   4 -0.540075365  0.23181458 0.1785854
 5:   5 -0.632543900 -1.09965990 0.6457325
 6:   6 -0.083221671 -0.96286343 0.2011136
 7:   7 -0.044674252 -0.27666473 0.7079499
 8:   8  0.260795244 -0.15159164 0.4663712
 9:   9 -0.134164758  0.01136245 0.7992453
10:  10  0.496144329  0.76168408 0.1821123

I'm hoping that there's a less roundabout manner of arriving at this result.

回答1:

A possible compact alternative which can also apply more functions to each group:

DTnew <- dcast(DT[, pval := t.test(V ~ Group, .SD, alternative = "two.sided")$p.value, Day],
               Day + pval ~ paste0("g",Group), fun = list(mean,sd), value.var = "V")

which gives:

> DTnew
    Day      pval   V_mean_g1    V_mean_g2   V_sd_g1   V_sd_g2
 1:   1 0.4763594 -0.11630634  0.178240714 0.7462975 0.4516087
 2:   2 0.5715001 -0.29689807  0.082970631 1.3614177 0.2745783
 3:   3 0.2295251 -0.48792449 -0.031328749 0.3723247 0.6703694
 4:   4 0.5565573  0.33982242  0.080169698 0.5635136 0.7560959
 5:   5 0.5498684 -0.07554433  0.308661427 0.9343230 1.0100788
 6:   6 0.4814518  0.57694034  0.885968245 0.6457926 0.6773873
 7:   7 0.8053066  0.29845913  0.116217727 0.9541060 1.2782210
 8:   8 0.3549573  0.14827289 -0.319017581 0.5328734 0.9036501
 9:   9 0.7290625 -0.21589411 -0.005785092 0.9639758 0.8859461
10:  10 0.9899833  0.84034529  0.850429982 0.6645952 1.5809149

A decomposition of the code:

First, a pval variable is added to the dataset with DT[, pval := t.test(V ~ Group, .SD, alternative = "two.sided")$p.value, Day]
Because DT is updated in place and by reference by the previous step, the dcast function can be applied to that directly.
In the casting formula, you specify the variables that need to stay in the current form on the RHS and the variable that needs to be spread over columns on the LHS.
With the fun argument you can specify which aggregation function has to be used on the value.var (here V). If multiple aggregation functions are needed, you can specify them in a list (e.g. list(mean,sd)). This can be any type of function. So, also cumstom made functions can be used.

If you want to remove the V_ from the column names, you can do:

names(DTnew) <- gsub("V_","",names(DTnew))

NOTE: I renamed the data.table to DT as it is often not wise to name your dataset after a function (check ?data)

回答2:

While not a one-liner, you might consider doing your two processes separate and then merging the results. This prevents you from having to hardcode the group-names.

First, we calculate the means:

my_means <- dcast(data[,mean(V), by = .(Day, Group)],
                  Day~ paste0("Mean_Group", Group),value.var="V1")

Or in the less-convoluted way @Akrun mentioned in the comments, with some added formatting.

my_means <- dcast(Day~paste0("Mean_Group", Group), data=data,
              fun.agg=mean, value.var="V")

Then the t-tests:

t_tests <- data[,.(p_value=t.test(V~Group)$p.value), by = Day]

And then merge:

output <- merge(my_means, t_tests)

Comparing Groups in data.table Columns

问题:

回答1:

回答2:

收藏的人(0)

Comparing Groups in data.table Columns

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮