I am trying to understand how to work with ANOVAs and post-hoc tests in R.
So far, I have used aov() and TukeyHSD() to analyse my data. Example:
uni2.anova <- aov(Sum_Uni ~ Micro, data= uni2)
uni2.anova
Call:
aov(formula = Sum_Uni ~ Micro, data = uni2)
Terms:
Micro Residuals
Sum of Squares 0.04917262 0.00602925
Deg. of Freedom 15 48
Residual standard error: 0.01120756
Estimated effects may be unbalanced
My problem is, now I have a huge list of pairwise comparisons but cannot do anything with it:
TukeyHSD(uni2.anova)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Sum_Uni ~ Micro, data = uni2)
$Micro
diff lwr upr p adj
Act_Glu2-Act_Ala2 -0.0180017863 -0.046632157 0.0106285840 0.6448524
Ana_Ala2-Act_Ala2 -0.0250134285 -0.053643799 0.0036169417 0.1493629
NegI_Ala2-Act_Ala2 0.0702274527 0.041597082 0.0988578230 0.0000000
This dataset has 40 rows...
Idealy, I would like to get a dataset that looks something like this:
- Act_Glu2 : a
- Act_Ala2 : a
- NegI_Ala2: b...
I hope you get the point. So far, I have found nothing comparable online... I also tried to select only significant pairs in the file resulting from TukeyHSD, but the file does not "acknowlegde" that it is made up of rows & columns, making selecting impossible...
Maybe there is something fundamentally wrong with my approach?
I think the OP wants the letters to get a view of the comparisons.
library(multcompView)
multcompLetters(extract_p(TukeyHSD(uni2.anova)))
That will get you the letters.
You can also use the multcomp package
library(multcomp)
cld(glht(uni2.anova, linct = mcp(Micro = "Tukey")))
I hope this is what you need.
The results from the TukeyHSD are a list. Use str
to look at the structure. In your case you'll see that it's a list of one item and that item is basically a matrix. So, to extract the first column you'll want to save the TukeyHSD result
hsd <- TukeyHSD(uni2.anova)
If you look at str(hsd)
you can that you can then get at bits...
hsd$Micro[,1]
That will give you the column of your differences. You should be able to extract what you want now.
Hard to tell without example data, but assuming Micro
is just a factor with 4 levels and uni2
looks something like
n = 40
Micro = c('Act_Glu2', 'Act_Ala2', 'Ana_Ala2', 'NegI_Ala2')[sample(4, 40, rep=T)]
Sum_Uni = rnorm(n, 5, 0.5)
Sum_Uni[Micro=='Act_Glu2'] = Sum_Uni[Micro=='Act_Glu2'] + 0.5
uni2 = data.frame(Sum_Uni, Micro)
> uni2
Sum_Uni Micro
1 4.964061 Ana_Ala2
2 4.807680 Ana_Ala2
3 4.643279 NegI_Ala2
4 4.793383 Act_Ala2
5 5.307951 NegI_Ala2
6 5.171687 Act_Glu2
...
then I think what you're actually trying to get at is the basic multiple regression output:
fit = lm(Sum_Uni ~ Micro, data = uni2)
summary(fit)
anova(fit)
> summary(fit)
Call:
lm(formula = Sum_Uni ~ Micro, data = uni2)
Residuals:
Min 1Q Median 3Q Max
-1.26301 -0.35337 -0.04991 0.29544 1.07887
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.8364 0.1659 29.157 < 2e-16 ***
MicroAct_Glu2 0.9542 0.2623 3.638 0.000854 ***
MicroAna_Ala2 0.1844 0.2194 0.841 0.406143
MicroNegI_Ala2 0.1937 0.2158 0.898 0.375239
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4976 on 36 degrees of freedom
Multiple R-squared: 0.2891, Adjusted R-squared: 0.2299
F-statistic: 4.88 on 3 and 36 DF, p-value: 0.005996
> anova(fit)
Analysis of Variance Table
Response: Sum_Uni
Df Sum Sq Mean Sq F value Pr(>F)
Micro 3 3.6254 1.20847 4.8801 0.005996 **
Residuals 36 8.9148 0.24763
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
You can access the numbers in any of these tables like, for example,
> summary(fit)$coef[2,4]
[1] 0.0008536287
To see the list of what is stored in each object, use names()
:
> names(summary(fit))
[1] "call" "terms" "residuals" "coefficients"
[5] "aliased" "sigma" "df" "r.squared"
[9] "adj.r.squared" "fstatistic" "cov.unscaled"
In addition to the TukeyHSD()
function you found, there are many other options for looking at the pairwise tests further, and correcting the p-values if desired. These include pairwise.table()
, estimable()
in gmodels
, the resampling
and boot
packages, and others...