I'm looking for a way to create a comparison-of-means (t-test) table from the output of a tabstat command. Basically, I want to know if the mean of each group is statistically significantly different from the mean for the variable overall.
I have 75 variables across 15 groups for a total of 1125 t-tests, so doing them one at a time is out of the question.
I could always write a loop for the tests, but I was wondering if there was a command similar to tabstat that would make the table for me. Google has been unhelpful thus far, even though it seems like a fairly logical place to go from a tabstat output.
Thanks!
There might be packages that better serve you, but here's an example that I just put together. It's assuming you are using the one sample t-test because I can't see another way to do it with a t-test. This block of code returns a matrix with three things: the difference from the grand mean, the t value, and the p value.
Feel free to adapt the code as you see fit. Actually it'd just take a few more steps to make it into an ado file.
sysuse auto,clear
loca varlist mpg weight length price // put varlist here
loca grpvar foreign // put grouping variable here
loca n_var=wordcount("`varlist'")
qui tab `grpvar'
loca n_grp=`r(r)'
mat T=J(`n_var'*3,`n_grp',.) // (# of vars*3, # of groups,.)
**colnames
loca cnames=""
su `grpvar', meanonly
forval i=`r(min)'/`r(max)' { // assuming consecutive sequence
loca cnames="`cnames'"+" "+"`i'"
}
mat colnames T=`cnames' // values of grouping variable
**rownames
loca rnames=""
forval i=1/`n_var' {
loca var=word("`varlist'",`i')
loca rnames="`rnames'"+" "+"`var':diff `var':t `var':p"
}
mat rownames T=`rnames' // difference, t value, p value
loca i=1
foreach var in `varlist' {
loca j=1
su `grpvar', meanonly
forval f=`r(min)'/`r(max)' {
su `var', meanonly
loca ydbhat=`r(mean)' // y double hat
su `var' if `grpvar'==`f', meanonly
loca diff=`ydbhat'-`r(mean)' // difference
qui ttest `var'=`ydbhat' if `grpvar'==`f' // one-sample ttest
mat T[`i',`j']=`diff'
mat T[`i'+1,`j']=`r(t)'
mat T[`i'+2,`j']=`r(p)'
loca ++j
}
loca i=`i'+3
}
mat list T, f(%8.3f)
Now I am not sure if 15 columns would be too wide. If so, change the display format or even just use putexcel
to export the matrix into a spreadsheet.
Edited: Fixed the forval i=0/1
in the loops to a more generally applicable form. Also other minor editing.
Edited the code a bit - can't post markdown in the comments, so I made it a new answer. This version does a two-sample t-test and also displays the cluster mean for each variable.
local varlist var1 var2 var3 // put varlist here
local grpvar _clus_1 // put grouping variable here
local n_var=wordcount("`varlist'")
qui summ `grpvar', meanonly
local n_grp=`r(max)'
mat T=J(`n_var'*4,`n_grp',.) // (# of vars*4,# of groups,.)
**colnames
local cnames=""
qui summ `grpvar', meanonly
forval i=`r(min)'/`r(max)' { // assuming consecutive sequence
local cnames="`cnames'"+" "+"`i'"
}
//di "`cnames'"
mat colnames T=`cnames' // values of grouping variable
**rownames
local rnames=""
forval i=1/`n_var' {
local var=word("`varlist'",`i')
local rnames="`rnames'"+" "+"`var':mean `var':diff `var':t-stat `var':p-value"
}
mat rownames T=`rnames' // mean, difference, t value, p value
local i=1
foreach var in `varlist' {
local j=1
qui summ `grpvar'
forval f=`r(min)'/`r(max)' {
qui summ `var'
local varmean=`r(mean)'
local varn = `r(N)'
local varsd = `r(sd)'
qui summ `var' if `grpvar'==`f'
local clusmean = `r(mean)'
local clusn = `r(N)'
local clussd = `r(sd)'
local diff=`clusmean'-`varmean' // difference
**two-sample t-test
qui ttesti `varn' `varmean' `varsd' `clusn' `clusmean' `clussd'
mat T[`i',`j']=`clusmean'
mat T[`i'+1,`j']=`diff'
mat T[`i'+2,`j']=`r(t)'
mat T[`i'+3,`j']=`r(p)'
local ++j
}
local i=`i'+4
}
mat list T, f(%8.3f)