Automate Chi-square across categories and columns

I have a survey dataframe containing several questions (columns) coded as 1=agree/0=disagree. Respondents (rows) are categorized according to metrics "age" ("young","middle","old"), "region" ("East","Mid","West"), etc. There are around 30 categories in total (3 ages, 3 regions, 2 genders, 11 occupations, etc.). Within each metric, categories are non-overlapping and of different sizes.

This simulates a cut-down version of the dataset:

n<-400
set.seed(1)
data<-data.frame(age=sample(c('young','middle','old'),n,replace=T),region=sample(c('East','Mid','West'),n,replace=T),gender=sample(c('M','F'),n,replace=T),Q15a=sample(c(0,1),n,replace=T),Q15b=sample(c(0,1),n,replace=T))

I can use Chi-square to test if the responses in, say, the West differ significantly from the total sample, for Q15a, with:

attach(data)
chisq.test(table(subset(data,region=='West')$Q15a),p=table(Q15a),rescale.p=T)

I want to test all categories against the total sample for Q15a, and then for ~20 other questions. As there are around 30 tests per question, I want to find a way (efficient or otherwise) to automate this, but am struggling to see how to get R to do this itself or how to write a loop to cycle through the categories. I've searched[1], and got sidetracked into pairwise comparison testing with pairwise.prop.test(), but haven't found anything that really answers this yet.

[1] similar but not duplicate questions (both are column-wise tests):

Using loops to do Chi-Square Test in R

Chi Square Analysis using for loop in R

标签： r chi-squared

2条回答

我只想做你的唯一

2楼-- · 2019-04-09 13:49

How about this?

# find all question columns containing Q, your "subset" may differ
nms <- names(data)
nms <- nms[grepl("Q", nms)]

result <- sapply(nms, FUN = function(x, data) {
  qinq <- data[, c("region", x)]
  by(data = qinq, INDICES = data$region, FUN = function(y, qinq) {
    chisq.test(table(y[, x]), p =  table(qinq[, x]), rescale.p = TRUE)
  }, qinq = qinq)
}, data = data, simplify = FALSE)

$Q15a
data$region: East

    Chi-squared test for given probabilities

data:  table(y[, x])
X-squared = 0.7494, df = 1, p-value = 0.3867

--------------------------------------------------------------------------------------------- 
data$region: Mid

    Chi-squared test for given probabilities

data:  table(y[, x])
X-squared = 0.2249, df = 1, p-value = 0.6353

--------------------------------------------------------------------------------------------- 
data$region: West

    Chi-squared test for given probabilities

data:  table(y[, x])
X-squared = 1.5877, df = 1, p-value = 0.2077


$Q15b
data$region: East

    Chi-squared test for given probabilities

data:  table(y[, x])
X-squared = 0.0697, df = 1, p-value = 0.7918

--------------------------------------------------------------------------------------------- 
data$region: Mid

    Chi-squared test for given probabilities

data:  table(y[, x])
X-squared = 0, df = 1, p-value = 0.9987

--------------------------------------------------------------------------------------------- 
data$region: West

    Chi-squared test for given probabilities

data:  table(y[, x])
X-squared = 0.056, df = 1, p-value = 0.8129

You can extract anything you want. Here's how you would extract a p.value.

lapply(result, FUN = function(x) lapply(x, "[", "p.value"))

$Q15a
$Q15a$East
$Q15a$East$p.value
[1] 0.3866613


$Q15a$Mid
$Q15a$Mid$p.value
[1] 0.6353457


$Q15a$West
$Q15a$West$p.value
[1] 0.2076507



$Q15b
$Q15b$East
$Q15b$East$p.value
[1] 0.7918426


$Q15b$Mid
$Q15b$Mid$p.value
[1] 0.9986924


$Q15b$West
$Q15b$West$p.value
[1] 0.8128969

Happy formatting.

0人赞添加讨论(0) 举报

女痞

3楼-- · 2019-04-09 14:01

You may also use chisq.desc() function from EnQuireR package. It worked fine for me. ALthough there is very less support available and this package is quite old (no updates from long), so few functions were not working but I find chisq.desc() useful. It Color the cells of the table containing the results from the Chi-square test, crossing all the selected categorical variables, according to a selected threshold. I am unable to comment, so writing as an answer.

0人赞添加讨论(0) 举报

Automate Chi-square across categories and columns

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间