t.test: create lapply function for multiple groupi

2019-08-17 10:58发布

问题:

I'm trying to create an lapply function to run multiple t.tests for multiple levels of grouping. I came across this question: Kruskal-Wallis test: create lapply function to subset data.frame? but they were only trying to group by one variable (phase). I would like to add another grouping level color, where my iv is distance and dv is val grouped by color then phase.

# create data
val<-runif(60, min = 0, max = 100)
distance<-floor(runif(60, min=1, max=3))
phase<-rep(c("a", "b", "c"), 20)
color<-rep(c("red", "blue","green","yellow","purple"), 12)

df<-data.frame(val, distance, phase, color)

Their answer for the grouping by phase was

lapply(split(df, df$phase), function(d) { kruskal.test(val ~ distance, data=d) })

However, it doesn't account for another level (color) for grouping. I might be approaching this wrong so I appreciate any help.

回答1:

Simply incorporate a list() inside split on needed column(s). However, with your sample this will create an error since some groups all share same distance values.

lapply(split(df, list(df$color, df$phase)), function(d) {
    kruskal.test(val ~ distance, data=d) 
})

Error in kruskal.test.default(c(76.6759299905971, 3.11371604911983, 17.6471394719556, : all observations are in the same group

Consequently, consider wrapping in tryCatch to return NA or any other object for those problem groups:

lapply(split(df, list(df$color, df$phase)), function(d) {
    tryCatch({ kruskal.test(val ~ distance, data=d) },
             error = function(e) NA)
})

By the way, consider by (object-oriented wrapper to tapply and often overlooked member of apply family) instead of nesting split inside lapply:

by(df, df[c("color", "phase")], function(d) {
    tryCatch({ kruskal.test(val ~ distance, data=d) },
             error = function(e) NA)
})