I have two 4*4 matrices in JAVA, where one matrix holds observed counts and the other expected counts.
I need an automated way to calculate the p-value from the chi-square statistic between these two matrices; however, JAVA has no such function as far as I am aware.
I can calculate the chi-square and its p-value by reading the two matrices into R as .csv file formats, and then using the chisq.test function as follows:
obs<-read.csv("obs.csv")
exp<-read.csv("exp.csv")
chisq.test(obs,exp)
where the format of the .csv files would as follows:
A, C, G, T
A, 197.136, 124.32, 63.492, 59.052
C, 124.32, 78.4, 40.04, 37.24
G, 63.492, 40.04, 20.449, 19.019
T, 59.052, 37.24, 19.019, 17.689
Given these commands, R will give an output of the format:
X-squared = 20.6236, df = 9, p-value = 0.01443
which includes the p-value I was looking for.
Does anyone know of an efficient way to automate the process of:
1) Outputting my matrices from JAVA into .csv files 2) Uploading the .csv files into R 3) Calling the chisq.test on the .csv files into R 4) Returning the outputted p-value back into JAVA?
Thanks for any help....
There are (at least) two ways of going about this.
Command Line & Scripts
You can execute Rscripts from the command line with
Rscript.exe
. E.g. in your script you would have:Rather than creating CSVs in Java and having R read them, you should be able to pass them straight to R. I don't see the need to create CSVs and pass data that way, UNLESS your matrices are quite big. There are limitations on the size of command line arguments you can pass (varies across operating system I think).
You can pass arguments into Rscripts and parse them using the
commandArgs()
functions or with various packages (e.g. optparse or getopt). See this thread for more information.There are several ways of calling and reading from the command line in Java. I don't know enough about it to give you advice but a bit of googling will give you a result. Calling a script from the command line is done like this:
JRI
JRI lets you talk to R straight from Java. Here's an example of how you would pass a double array to R and have R sum it (this is Java now):
The function
assign()
here is the same as doing this in R:You should be able to work out how to extend this to work with a matrix.
I think JRI is quite difficult at the beginning. So if you want to get this done quickly the command line option is probably best. I would say the JRI approach is less messy once you get it set up though. And if you have situations where you have a lot of back and forth between R and Java it is definitely better than calling multiple scripts.
RCaller 2.2 can do what you want to do. Suppose the frequency matrix is given as in your question. The resulted p.value and df variables can be calculated and returned using the code below:
The output is:
You can get the technical details in here
1) Outputting my matrices from JAVA into .csv files
Use any of CSV libraies, I would recommend http://opencsv.sourceforge.net/
2) Uploading the .csv files into R 3) Calling the chisq.test on the .csv files into R
2 & 3 a pretty the same, You better create parametrized script to be run in R.
So you can run
and use unique names for the csv files for example:
And then you use
4) Returning the outputted p-value back into JAVA? You can only read the output of R if you are using getRuntime().exec() to invoke R.
I would also recommend to take a look at Apache's Statistics Lib & How to calculate PValue from ChiSquare. Maybe you can live without R at all :)
I recommend to simply use a Java library that does a ChiSquare test for you. There are enough of them:
This is not a complete list, but what I found in 5 minutes searching.
Check this page JRI
Description form their site:
Rserve is another way to get your data from Java to R and back. It is a server which takes R scripts as string inputs. You can use some string parsing and conversion in Java to convert the matrices into strings that can be input into R.
Here is some more information on Rserve. Incidentally, this is also how Tableau can communicate with R as well with their R connection.
https://cran.r-project.org/web/packages/Rserve/index.html