I would like to call an R script from Java. I have done google searches on the topic, but almost all of the results I have seen would require me to add a dependency to some third party library. Can anyone show me a good way to accomplish the same thing without adding any dependencies to my code?
I am using a windows machine, so perhaps I might use the command line to start R (if it is not already open) and to run a specific R script. But I have never written command line code (or called it from Java) so I would need code examples.
I am including working sample code that I wrote for one possible approach below, using my command line idea. In my in-line-comments below, you can see that Step Three in AssembleDataFile.java is intentionally left blank by me. If you think that you can make the command line idea work, then please show me what code to write in Step Three.
Also, feel free to suggest another approach that, hopefully, does not involve adding any more dependencies to my code.
And, as always, I very much appreciate any links you might post to articles/tutorials/etc related to this question.
Here is what I have so far:
AssembleDataFile.java
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.PrintWriter;
public class AssembleDataFile {
static String delimiter;
static String localPath = "C:\\test\\cr\\";
static String[][] myDataArray;
public static void main(String[] args) {
String inputPath = localPath+"pd\\";
String fileName = "MSData.txt";
delimiter = "\\t";
// Step One: Import data in two parts
try {
// 1A: get length of data file
BufferedReader br1 = new BufferedReader(new FileReader(inputPath+fileName));
int numRows = 0;
int numCols = 0;
String currentRow;
while ((currentRow = br1.readLine()) != null) {
numRows += 1;
numCols = currentRow.split(delimiter).length;}
br1.close();
//1B: populate data into array
myDataArray = new String[numRows][numCols+1];
BufferedReader br2 = new BufferedReader(new FileReader(inputPath+fileName));
String eachRow;
int rowIdx = 0;
while ((eachRow = br2.readLine()) != null) {
String[] splitRow = eachRow.split(delimiter);
for(int z = 0;z < splitRow.length;z++){myDataArray[rowIdx][z] = splitRow[z];}
rowIdx += 1;}
br2.close();
// Step Two: Write data to csv
String rPath = localPath+"r\\";
String sFileName = rPath+"2colData.csv";
PrintWriter outputWriter = new PrintWriter(sFileName);
for(int q = 0;q < myDataArray.length; q++){
outputWriter.println(myDataArray[q][8]+", "+myDataArray[q][9]);
}
outputWriter.close();
//Step Three: Call R script named My_R_Script.R that uses 2ColData.csv as input
// not sure how to write this code. Can anyone help me write this part?
// For what it is worth, one of the R scripts that I intend to call is included below
//
//added the following lines here, per Vincent's suggestion:
String rScriptFileName = rPath+"My_R_Script.R";
Runtime.getRuntime().exec("mypathto\\R\\bin\\Rscript "+rScriptFileName);
//
//
//Step Four: Import data from R and put it into myDataArray's empty last column
try {Thread.sleep(30000);}//make this thread sleep for 30 seconds while R creates the needed file
catch (InterruptedException e) {e.printStackTrace();}
String matchFileName = rPath+"Matches.csv";
BufferedReader br3 = new BufferedReader(new FileReader(matchFileName));
String thisRow;
int rowIndex = 0;
while ((thisRow = br3.readLine()) != null) {
String[] splitRow = thisRow.split(delimiter);
myDataArray[rowIndex][numCols] = splitRow[0];
rowIndex += 1;}
br3.close();
//Step Five: Check work by printing out one row from myDataArray
//Note that the printout has one more column than the input file had.
for(int u = 0;u<=numCols;u++){System.out.println(String.valueOf(myDataArray[1][u]));}
}
catch (FileNotFoundException e) {e.printStackTrace();}
catch (IOException ie){ie.printStackTrace();}
}
}
My_R_Script.R
myCSV <- read.csv(file="2colData.csv",head=TRUE,sep=",")
pts = SpatialPoints(myCSV)
Codes = readShapeSpatial("mypath/myshapefile.shp")
write.csv(ZipCodes$F[overlay(pts,Codes)], "Matches.csv", quote=FALSE, row.names=FALSE)
EDIT:
Here is the error message that is being thrown when I add Runtime.getRuntime().exec("Rscript "+rScriptFileName); to the code above:
java.io.IOException: Cannot run program "Rscript": CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessBuilder.start(Unknown Source)
at java.lang.Runtime.exec(Unknown Source)
at java.lang.Runtime.exec(Unknown Source)
at java.lang.Runtime.exec(Unknown Source)
at AssembleDataFile.main(AssembleDataFile.java:52)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(Unknown Source)
at java.lang.ProcessImpl.start(Unknown Source)
... 5 more
SECOND EDIT: The code above now works because I followed Vincent's suggestions. However, I had to put in a sleep command in order to give the R script enough time to run. Without the sleep command, the java code above throws an error saying that the Matches.csv file does not exist. I am concerned that a 30 second sleep period is too rough of an instrument. Can anyone show me code that gets the java program to wait until the R program has a chance to create Matches.csv? I hesitate to use thread tools because I have read that poorly designed threads can cause bugs that are nearly impossible to localize and fix.