I am a newbie of R. Due to the need of my project, I need to do Chisq test for hundred thousand entries.
I learned by myself for a few days and write some code for runing chisq.test in loops. codes:
the.data = read.table ("test_chisq_allelefrq.txt", header=T, sep="\t",row.names=1)
p=c()
ID=c()
for (i in 1:nrow(the.data)) {
data.row = the.data [i,]
data.matrix = matrix ( c(data.row$cohort_1_AA, data.row$cohort_1_AB, data.row$cohort_1_BB, data.row$cohort_2_AA, data.row$cohort_2_AB, data.row$cohort_2_BB,data.row$cohort_3_AA,data.row$cohort_3_AB,data.row$cohort_3_BB), byrow=T, nrow=3)
chisq = chisq.test(data.matrix)
pvalue=chisq$p.value
p=c(p, pvalue)
No=row.names(the.data)[i]
ID=c(rsid, SNP )
}
results=data.frame(ID,p)
write.table (results, file = "chisq-test_output.txt", append=F, quote = F, sep = "\t ",eol = "\n", na = "NA", dec = ".", row.names = F, col.names = T)
this code might have several problems. but it works.
However, it runs very slow.
I try to improve it by using "apply"
I plan to use apply twice instead of using "for"
datarow= apply (the.data,1, matrix(the.data, byrow=T, nrow=3))
result=apply(datarow,1,chisq.test)
However, there is error saying matrix is not a function. zsd the chisq.test output is a list, I cannot use write.table to output the data.
the.data is like this.
SN0001 and 9 numbers
cohort_1_AA cohort_1_AB cohort_1_BB cohort_2_AA cohort_2_AB cohort_2_BB cohort_3_AA cohort_3_AB cohort_3_BB
SN0001 197 964 1088 877 858 168 351 435 20
....
....
I have been trying for days and nights. Hope someone can help me. Thank you very much.