I have to read a file in a list of folders and save data in R.
I use following code for my test data and it works. When I use the code for the actual data then
I get this error
Error: OutOfMemoryError (Java): GC overhead limit exceeded
Called from: top level
This is what I have done for my test data
parent.folder <- "C:/Users/sandesh/Desktop/test_R"
sub.folder <- list.dirs(parent.folder, recursive =TRUE)[-1]
file <- file.path(sub.folder, "sandesh1.xlsx")
library(xlsx)
library(plyr)
fun <- function(file) {
df <- read.xlsx(file, sheetIndex=1)
}
df.big <- ldply(file, fun)
This is a typical problem in rJava. It is answered in the XLConnect documentation which also uses rJava to connect to excel the same way as the xlsx library. I quote from here:
"This is caused by the fact that XLConnect (same for xlsx) needs to copy your entire data object over to the JVM in order to write it to a file and the JVM has to be initialized with a fixed upper limit on its memory size. To change this amount, you can pass parameters to the R’s JVM just like you can to a command line Java process via rJava’s options support:
Note, however, that these parameters are evaluated exactly once per R session when the JVM is initialized - this is usually once you load the first package that uses Java support, so you should do this as early as possible."
As it is mentioned above run the options function at the beginning of your script before loading any libraries and if you are running it through Rstudio make sure you restart it before you run the script.
Also, please note that it is still not certain that even this will work depending on the size of the file you are trying to parse.