Parallel processing of big rasters in R (windows)

2019-04-23 11:21发布

问题:

I'm using the doSNOW package and more specifically the parLapply function to perform reclassification (and subsequently other operations) on a list of big raster datasets (OS: Windows x64).

The code looks a little like this minimalistic example:

library(raster)
library(doSNOW)

#create list containing test rasters

x <- raster(ncol=10980,nrow=10980) 
x <- setValues(x,1:ncell(x)) 

list.x <- replicate( 9 , x )

#setting up cluster

NumberOfCluster <- 8
cl <- makeCluster(NumberOfCluster)
registerDoSNOW(cl)
junk <- clusterEvalQ(cl,library(raster))

#perform calculations on each raster

list.x <- parLapply(cl,list.x,function(x) calc(x,function(x) { x * 10 }))

#stop cluster

stopCluster(cl)

The code actually works as intended. The problem occurs when I want to proceed with the results. I'm receiving this error message:

> plot(list.x[[1]])
Error in file(fn, "rb") : cannot open the connection
In addition: Warning message:
In file(fn, "rb") :
  cannot open file 'C:\Users\*****\AppData\Local\Temp\RtmpyKYdpY\raster\r_tmp_2016-02-29_133158_752_67867.gri': No such file or directory

As far as I understood, since the rasters are quite big, they are saved in a temp file on disk. And when I'm closing the snow cluster, these files can't be accessed anymore.

So my question is, how can I access the data once the cluster is closed? Can I proceed using this method?

Thanks!

回答1:

I had this exact problem while running the rasterize fucntion inside a cluster in R.

All tests worked perfectly but when I upscaled to very large and fine resolution rasters, I repeatedly got errors regarding temp files that I couldn't even find on my computer. The list object, which I needed to merge and write as 1 raster, was in R but I could do nothing with it.

After watching the temp file directory whilst the cluster was running I noticed that closing the cluster will auto-delete all temp files created, so I had to perform the merge and writeRaster functions inside the cluster, otherwise it would fail on a very similar error to yours.



回答2:

You could pass specific filenames to calc (or, e.g., reclassify), and have your function return those filenames as a vector to be read into a stack:

ff <- parSapply(cl, list.x, function(x) { 
  calc(x, function(x) x*10, filename=f <- tempfile(fileext='.tif'))
  f
})

s <- stack(ff)

But also look at ?clusterR- I suspect it will work with reclassify. From the docs:

This function only works with functions that have a Raster* object as first argument and that operate on a cell by cell basis (i.e., there is no effect of neigboring cells) and return an object with the same number of cells as the input raster object. The first argument of the function called must be a Raster* object. There can only be one Raster* object argument. For example, it works with calc and it also works with overlay as long as you provide a single RasterStack or RasterBrick as the first argument.