I have a survey dataframe containing several questions (columns) coded as 1=agree/0=disagree. Respondents (rows) are categorized according to metrics "age" ("young","middle","old"), "region" ("East","Mid","West"), etc. There are around 30 categories in total (3 ages, 3 regions, 2 genders, 11 occupations, etc.). Within each metric, categories are non-overlapping and of different sizes.
This simulates a cut-down version of the dataset:
n<-400
set.seed(1)
data<-data.frame(age=sample(c('young','middle','old'),n,replace=T),region=sample(c('East','Mid','West'),n,replace=T),gender=sample(c('M','F'),n,replace=T),Q15a=sample(c(0,1),n,replace=T),Q15b=sample(c(0,1),n,replace=T))
I can use Chi-square to test if the responses in, say, the West differ significantly from the total sample, for Q15a, with:
attach(data)
chisq.test(table(subset(data,region=='West')$Q15a),p=table(Q15a),rescale.p=T)
I want to test all categories against the total sample for Q15a, and then for ~20 other questions. As there are around 30 tests per question, I want to find a way (efficient or otherwise) to automate this, but am struggling to see how to get R to do this itself or how to write a loop to cycle through the categories. I've searched[1], and got sidetracked into pairwise comparison testing with pairwise.prop.test(), but haven't found anything that really answers this yet.
[1] similar but not duplicate questions (both are column-wise tests):
How about this?
You can extract anything you want. Here's how you would extract a p.value.
Happy formatting.
You may also use chisq.desc() function from EnQuireR package. It worked fine for me. ALthough there is very less support available and this package is quite old (no updates from long), so few functions were not working but I find chisq.desc() useful. It Color the cells of the table containing the results from the Chi-square test, crossing all the selected categorical variables, according to a selected threshold. I am unable to comment, so writing as an answer.