I am a newbie of R. Due to the need of my project, I need to do Chisq test for hundred thousand entries.
I learned by myself for a few days and write some code for runing chisq.test in loops. codes:
the.data = read.table ("test_chisq_allelefrq.txt", header=T, sep="\t",row.names=1)
p=c()
ID=c()
for (i in 1:nrow(the.data)) {
data.row = the.data [i,]
data.matrix = matrix ( c(data.row$cohort_1_AA, data.row$cohort_1_AB, data.row$cohort_1_BB, data.row$cohort_2_AA, data.row$cohort_2_AB, data.row$cohort_2_BB,data.row$cohort_3_AA,data.row$cohort_3_AB,data.row$cohort_3_BB), byrow=T, nrow=3)
chisq = chisq.test(data.matrix)
pvalue=chisq$p.value
p=c(p, pvalue)
No=row.names(the.data)[i]
ID=c(rsid, SNP )
}
results=data.frame(ID,p)
write.table (results, file = "chisq-test_output.txt", append=F, quote = F, sep = "\t ",eol = "\n", na = "NA", dec = ".", row.names = F, col.names = T)
this code might have several problems. but it works.
However, it runs very slow.
I try to improve it by using "apply"
I plan to use apply twice instead of using "for"
datarow= apply (the.data,1, matrix(the.data, byrow=T, nrow=3))
result=apply(datarow,1,chisq.test)
However, there is error saying matrix is not a function. zsd the chisq.test output is a list, I cannot use write.table to output the data.
the.data is like this.
SN0001 and 9 numbers
cohort_1_AA cohort_1_AB cohort_1_BB cohort_2_AA cohort_2_AB cohort_2_BB cohort_3_AA cohort_3_AB cohort_3_BB
SN0001 197 964 1088 877 858 168 351 435 20
....
....
I have been trying for days and nights. Hope someone can help me. Thank you very much.
To use apply group of functions it is easy first to define our own function and then apply it. Lets do that.
Try this code hopefully it works !! :) Accept the answer as correct if it works for you. thanks
One
for
loop implies oneapply
, not two.Something like this:
If the result is more readable than the
for
loop, go with it. Otherwise stick with what you have.apply
won't be noticeably different in speed (faster or slower).