how to run chisq.test in loops using apply

2019-09-06 06:53发布

I am a newbie of R. Due to the need of my project, I need to do Chisq test for hundred thousand entries.

I learned by myself for a few days and write some code for runing chisq.test in loops. codes:

the.data = read.table ("test_chisq_allelefrq.txt", header=T, sep="\t",row.names=1)
p=c()
ID=c()
for (i in 1:nrow(the.data)) {
data.row = the.data [i,]
data.matrix = matrix ( c(data.row$cohort_1_AA, data.row$cohort_1_AB,       data.row$cohort_1_BB, data.row$cohort_2_AA, data.row$cohort_2_AB, data.row$cohort_2_BB,data.row$cohort_3_AA,data.row$cohort_3_AB,data.row$cohort_3_BB), byrow=T, nrow=3)
chisq = chisq.test(data.matrix)
pvalue=chisq$p.value
p=c(p, pvalue)
No=row.names(the.data)[i]
ID=c(rsid, SNP )
}
results=data.frame(ID,p)
write.table (results,  file = "chisq-test_output.txt", append=F, quote = F, sep = "\t ",eol = "\n", na = "NA", dec = ".", row.names = F, col.names = T) 

this code might have several problems. but it works.

However, it runs very slow.

I try to improve it by using "apply"

I plan to use apply twice instead of using "for"

datarow= apply (the.data,1,  matrix(the.data, byrow=T, nrow=3))
result=apply(datarow,1,chisq.test)

However, there is error saying matrix is not a function. zsd the chisq.test output is a list, I cannot use write.table to output the data.

the.data is like this.

SN0001 and 9 numbers
           cohort_1_AA cohort_1_AB cohort_1_BB cohort_2_AA cohort_2_AB cohort_2_BB cohort_3_AA cohort_3_AB cohort_3_BB
SN0001     197         964        1088       877      858      168     351    435      20
....
....

I have been trying for days and nights. Hope someone can help me. Thank you very much.

2条回答
倾城 Initia
2楼-- · 2019-09-06 06:54

To use apply group of functions it is easy first to define our own function and then apply it. Lets do that.

    ##first define the function to apply
    Chsq <- function(x){
   ## input is a row of your data
   ## creating a table from each row
         x <- matrix(x,byrow =TRUE,nrow=3)
    ### this will return the p value
      return(chisq.test(x)$p.value)
    }
## Now apply this function
data = read.table ("test_chisq_allelefrq.txt", header=T, sep="\t",row.names=1)
## by using as.vector convert the output into a vector
P_Values <- as.vector(apply(data,1,Chsq))
result <- cbind(rownames(data),P_Values)
write.table (results,  file = "chisq-test_output.txt", append=F, quote = F, sep = "\t ",eol = "\n", na = "NA", dec = ".", row.names = F, col.names = T) 

Try this code hopefully it works !! :) Accept the answer as correct if it works for you. thanks

查看更多
冷血范
3楼-- · 2019-09-06 07:04

One for loop implies one apply, not two.

Something like this:

result=apply(the.data, 1, function(data.row) {
   ## Your code using data.row
})

If the result is more readable than the for loop, go with it. Otherwise stick with what you have. apply won't be noticeably different in speed (faster or slower).

查看更多
登录 后发表回答