ff package write error

2020-04-15 11:18发布

问题:

I'm trying to work with a 1909x139352 dataset using R. Since my computer only has 2GB of RAM, the dataset turns out to be too big (500MB) for the conventional methods. So I decided to use the ff package. However, I've been having some troubles. The function read.table.ffdf is unable to read the first chunk of data. It crashes with the next error:

txtdata <- read.table.ffdf(file="/directory/myfile.csv", 
                           FUN="read.table", 
                           header=FALSE, 
                           sep=",", 
                          colClasses=c("factor",rep("integer",139351)), 
                          first.rows=100, next.rows=100, 
                          VERBOSE=TRUE)

  read.table.ffdf 1..100 (100)  csv-read=77.253sec
  Error en  ff(initdata = initdata, length = length, levels = levels, ordered = ordered,  : 
   write error

Does anyone have any idea of what is going on?

回答1:

This error message indicates that you have too many open files. In ff, every column in your ffdf is a file. You can only have a limited number of files open - and you have hit that number. See my reply on Any ideas on how to debug this FF error?.

So in your case, using simply read.table.ffdf won't work because you have 139352 columns. It is possible however to import it in ff but you need to be carefull when opening columns while getting data in RAM to avoid this issue.



回答2:

Your data set really isn't that big.. It might help if you said something about what you're trying to do with it. this might help: Increasing Available memory in R or if that doesn't work, the data.table package is VERY fast and doesn't hog memory when manipulating data.tables with the := operator.
and as far as read.table.ffdf, check this out.. read.table.ffdf tutorial, if you read carefully, it gives hints and details about optimizing your memory usage with commands like gc() and more.



回答3:

I recently encountered this problem with a data frame that had ~ 3,000 columns. The easiest way to get around this is to adjust the maximum number of files allowed open for your user account. The typical system is set to ~ 1024 and that is a very conservative limit. Do note that it is set to prevent resource exhaustion on the server.

On Linux:

Add the following to your /etc/security/limits.conf file.

youruserid hard nofile 200000 # you may enter whatever number you wish here youruserid soft nofile 200000 # whatever you want the default to be for each shell or process you have running

On OS X:

Add or edit the following in your /etc/sysctl.con file. kern.maxfilesperproc=200000 kern.maxfiles=200000

You'll need to log out and log back in but then the original poster would be able to use the ffdf to open his 139352 column data frame.

I've posted more about my run-in with this limit here.



标签: r ff ffbase