how to run chisq.test in loops using apply

2019-09-06 07:01发布

问题:

I am a newbie of R. Due to the need of my project, I need to do Chisq test for hundred thousand entries.

I learned by myself for a few days and write some code for runing chisq.test in loops. codes:

the.data = read.table ("test_chisq_allelefrq.txt", header=T, sep="\t",row.names=1)
p=c()
ID=c()
for (i in 1:nrow(the.data)) {
data.row = the.data [i,]
data.matrix = matrix ( c(data.row$cohort_1_AA, data.row$cohort_1_AB,       data.row$cohort_1_BB, data.row$cohort_2_AA, data.row$cohort_2_AB, data.row$cohort_2_BB,data.row$cohort_3_AA,data.row$cohort_3_AB,data.row$cohort_3_BB), byrow=T, nrow=3)
chisq = chisq.test(data.matrix)
pvalue=chisq$p.value
p=c(p, pvalue)
No=row.names(the.data)[i]
ID=c(rsid, SNP )
}
results=data.frame(ID,p)
write.table (results,  file = "chisq-test_output.txt", append=F, quote = F, sep = "\t ",eol = "\n", na = "NA", dec = ".", row.names = F, col.names = T) 

this code might have several problems. but it works.

However, it runs very slow.

I try to improve it by using "apply"

I plan to use apply twice instead of using "for"

datarow= apply (the.data,1,  matrix(the.data, byrow=T, nrow=3))
result=apply(datarow,1,chisq.test)

However, there is error saying matrix is not a function. zsd the chisq.test output is a list, I cannot use write.table to output the data.

the.data is like this.

SN0001 and 9 numbers
           cohort_1_AA cohort_1_AB cohort_1_BB cohort_2_AA cohort_2_AB cohort_2_BB cohort_3_AA cohort_3_AB cohort_3_BB
SN0001     197         964        1088       877      858      168     351    435      20
....
....

I have been trying for days and nights. Hope someone can help me. Thank you very much.

回答1:

To use apply group of functions it is easy first to define our own function and then apply it. Lets do that.

    ##first define the function to apply
    Chsq <- function(x){
   ## input is a row of your data
   ## creating a table from each row
         x <- matrix(x,byrow =TRUE,nrow=3)
    ### this will return the p value
      return(chisq.test(x)$p.value)
    }
## Now apply this function
data = read.table ("test_chisq_allelefrq.txt", header=T, sep="\t",row.names=1)
## by using as.vector convert the output into a vector
P_Values <- as.vector(apply(data,1,Chsq))
result <- cbind(rownames(data),P_Values)
write.table (results,  file = "chisq-test_output.txt", append=F, quote = F, sep = "\t ",eol = "\n", na = "NA", dec = ".", row.names = F, col.names = T) 

Try this code hopefully it works !! :) Accept the answer as correct if it works for you. thanks



回答2:

One for loop implies one apply, not two.

Something like this:

result=apply(the.data, 1, function(data.row) {
   ## Your code using data.row
})

If the result is more readable than the for loop, go with it. Otherwise stick with what you have. apply won't be noticeably different in speed (faster or slower).