Chi-squared test of independence on all combinatio

this is my first time posting here and I hope this is all in the right place. I have been using R for basic statistical analysis for some time, but haven't really used it for anything computationally challenging and I'm very much a beginner in the programming/ data manipulation side of R.

I have presence/absence (binary) data on 72 plant species in 323 plots in a single catchment. The dataframe is 323 rows, each representing a plot, with 72 columns, each representing a species. This is a sample of the first 4 columns (some row numbers are missing because the 323 plots are a subset of a larger number of preassigned plots, not all of which were surveyed):

> head(plots[,1:4])
 Agrostis.canina Agrostis.capillaris Alchemilla.alpina Anthoxanthum.odoratum
1               1                   0                 0                     0
3               0                   0                 0                     0
4               0                   0                 0                     0
5               0                   0                 0                     0
6               0                   0                 0                     0
8               0                   0                 0                     0

I want to to determine whether any of the plant species in this catchment are associated with any others, and if so, whether that is a positive or negative association. To do this I want to perform a chi-squared test of independence on each combination of species. I need to create a 2x2 contingency table for each speciesxspecies comparison, run a chi-squared test on each of those contingency tables, and save the output. Ultimately I would like to end up with a list or matrix of all species by species tests that shows whether that combination of species has a positive, negative, or no significant association. I'd also like to incorporate some code that only shows an association as positive if all expected values were greater than 5.

I have made a start by writing the following function:

CHI <- function(sppx, sppy) 
{test <- chisq.test(table(sppx, sppy)) 
result <- c(test$statistic, test$p.value,
        sign((table(sppx, sppy) - test$expected)[2,2]))
return(result)
}

This returns the following:

> CHI(plots$Agrostis.canina, plots$Agrostis.capillaris)

X-squared                             
1.095869e-27  1.000000e+00 -1.000000e+00 
Warning message:
In chisq.test(chitbl) : Chi-squared approximation may be incorrect

Now I'm trying to figure out a way to apply this function to each speciesxspecies combination in the data frame. I essentially want R to take each column, apply the CHI function to that column and each other column in sequence, and so on through all the columns, subtracting each column from the dataframe as it is done so the same species pair is not tested twice. I have tried various methods trying to use "for" loops or "apply" functions, but have not been able to figure this out. I hope that is clear enough. Any help here would be much appreciated. I have tried looking for existing solutions to this specific problem online, but haven't been able to find any that really helped. If anyone could link me to an existing answer to this that would also be great.

标签： r chi-squared

3条回答

成全新的幸福

2楼-- · 2019-03-30 03:02

I think you are looking for something like this. I used the iris dataset.

require(datasets)
ind<-combn(NCOL(iris),2)
lapply(1:NCOL(ind), function (i) CHI(iris[,ind[1,i]],iris[,ind[2,i]]))

0人赞添加讨论(0) 举报

ら.Afraid

3楼-- · 2019-03-30 03:03

The below R code run chisquare test for every categorical variable / every factor of a r dataframe, against a variable given (x or y chisquare parameter is kept stable, is explicitly defined):

Define your variable Please - change df$variable1 to your desired factor variable and df to your desirable dataframe that contain all the factor variables tested against the given df$variable1

Define your Dataframe A new dataframe is created (df2) that will contain all the chi square values / dfs, p value of the given variable vs dataframe comparisons

Code created / completed/ altered from similar posts in stackoverflow, neither that produced my desired outcome. Chi-Square Tables statistic / df / p value for variable vs dataframe "2" parameter define column wide comparisons - check apply (MARGIN) option.

df2 <- t(round(cbind(apply(df, 2, function(x) {
  ch <- chisq.test(df$variable1, x)
  c(unname(ch$statistic), ch$parameter, ch$p.value )})), 3))

0人赞添加讨论(0) 举报

看我几分像从前

4楼-- · 2019-03-30 03:25

You need the combn function to find all the combinations of the columns and then apply them to your function, something like this:

apply(combn(1:ncol(plots), 2), 2, function(ind) CHI(plots[, ind[1]], plots[, ind[2]]))

0人赞添加讨论(0) 举报

Chi-squared test of independence on all combinatio

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间