raster package taking all hard drive

2019-03-13 16:42发布

问题:

I am processing a time series of rasters (modis ndvi imagery) to calculate average and st.deviation of the series. Each yearly series is composed of 23 ndvi.tif images, each of 508Mb, so total is a big 11Gb to process. Below is the script for one year. I have to repeat this for a number of years.

library(raster)
library("rgeos")
filesndvi <- list.files(, pattern="NDVI.tif",full.names=TRUE) 
filesetndvi10 <- stack(filesndvi)
names(filesetndvi10)
avgndvi10<-mean(filesetndvi10)
desviondvi10 <- filesetndvi10 - avgndvi10
sumdesvioc <-sum(desviondvi10^2)
varndvi10  <- sumdesvioc/nlayers(filesetndvi10)
sdndvi10  <- sqrt(varndvi10)
cvndvi10  <- sdndvi10/avgndvi10

The problem: the process writes accumulatively in the hard drive until it's full. Don't know where in the HD the process writes. Only way to clean the HD I've found is reboot. Tried rm, didn't work. Tried closing RStudio, didn't work. I'm using R 3.0.2 with RStudio 0.98.994 with Ubuntu 14.04 on a 4Gb RAM Asus UX31 with a 256Gb HD. Any thoughts to clean the HD after the calculation for each year without rebooting will be much welcome. Thanks

回答1:

There are two other things to consider. First, make fewer intermediate files by combining steps in calc or overlay functions (not too much scope for that here, but there is some), This can also speed up computations as there will be less reading from and writing to disk. Second, take control of deleting specific files. In the calc and overlay functions you can provide filenames such that you can remove the files you no longer need. But you can also delete the temp files explicitly. It is of course good practice to first remove the objects that point to these files. Here is an example based on yours.

library(raster)
# example data
set.seed(0)
ndvi <- raster(nc=10, nr=10)
n1 <- setValues(ndvi, runif(100) * 2 - 1)
n2 <- setValues(ndvi, runif(100) * 2 - 1)
n3 <- setValues(ndvi, runif(100) * 2 - 1)
n4 <- setValues(ndvi, runif(100) * 2 - 1)
filesetndvi10 <- stack(n1, n2, n3, n4)

nl <- nlayers(filesetndvi10)
avgndvi10 <- mean(filesetndvi10)
desviondvi10_2 <- overlay(filesetndvi10, avgndvi10, fun=function(x, y) (x - y)^2 , filename='over_tmp.grd')
sdndvi10 <- calc(desviondvi10_2, fun=function(x) sqrt(sum(x) / nl), filename='calc_tmp.grd')
cvndvi10  <- overlay(xsdndvi10, avgndvi10, fun=function(x,y) x / y, filename='cvndvi10.grd', overwrite=TRUE)

f <- filename(avgndvi10)
rm(avgndvi10, desviondvi10_2, sdndvi10)
file.remove(c(f, extension(f, '.gri')))
file.remove(c('over_tmp.grd', 'over_tmp.gri', 'calc_tmp.grd', 'calc_tmp.gri'))

To find out where temp files are written to look at

rasterOptions()

or to get the path as a variable do:

dirname(rasterTmpFile()) 

To set it the path, use

rasterOptions(tmpdir='a path')


回答2:

I struggle with the same, but have a few tricks that help. First off is get more memory. Ram and HD space are cheap and will have dramatic effects when dealing with large R objects such as rasters. Secondly, use removeTmpFiles() in the raster package. You can set it ti remove tmp files older than a certain number of hours. e.g. removeTmpFiles(0.5) will remove tmp files older than 30 minutes. Make sure you only set this for a time when the files will longer be called on. Thirdly, use something like the below snip of rasterOptions(). Be careful with setting memory chunk sizes; those will NOT work for your system, but you might find something more optimized than the defaults. Finally, use rm() and gc() to clean as you cook. Hope this helps, but if you find a better solution please let me know.

tmpdir_name <- paste(c(drive, ":/RASTER_TEMP/"), collapse='')
if(file.exists(tmpdir_name) == FALSE){
    dir.create(tmpdir_name)
}

rasterOptions(datatype = "FLT4S", 
    progress = "text", 
    tmpdir = tmpdir_name, 
    tmptime = 4, 
    timer = TRUE,
    tolerance = 0.5,
    chunksize = 1e+08,
    maxmemory = 1e+09)


回答3:

I found another way to manage this problem that was better for me, drawing on this answer. In my case, I am using parallel looping and don't want to remove all the files from the temporary directory because it could remove other processes' temp files.

@RobertH's answer which suggests naming each individual temporary file name is good, but I wasn't sure if that manually forces raster to write even small files to a hard drive instead of using RAM and slowing down the process (raster documentation says that it only writes to disk if the file won't fit into RAM).

So, what I did is create a temporary directory from within the loop or parallel process that is tied to a unique name from the data that is being processed in the loop, in my case, the value of single@data$OWNER:

#creates unique filepath for temp directory
dir.create (file.path("c:/",single@data$OWNER), showWarnings = FALSE)

#sets temp directory
rasterOptions(tmpdir=file.path("c:/",single@data$OWNER)) 

Insert your processing code here, then at the end of the loop delete the whole folder:

#removes entire temp directory without affecting other running processes
unlink(file.path("c:/",single@data$OWNER), recursive = TRUE)


回答4:

I noticed that in RobertH useful answer the last suggested command has an extra "e". It should be
rasterOptions(tmpdir='a path')

instead of
rasterOptions(tempdir='a path')



回答5:

Maybe it's obvious but another tips I found implementing the advice in this thread is to be carful about the order you process the instructions. Try to avoid to do the same instructions in bulk and "clean all afterwards". Atomize the code and clean the small pieces. For exemple instead of (from above):

[...]
nl <- nlayers(filesetndvi10)
avgndvi10 <- mean(filesetndvi10)

desviondvi10_2 <- overlay(filesetndvi10, avgndvi10, fun=function(x, y) (x - y)^2 , 
filename='over_tmp.grd')
sdndvi10 <- calc(desviondvi10_2, fun=function(x) sqrt(sum(x) / nl), filename='calc_tmp.grd')
cvndvi10  <- overlay(xsdndvi10, avgndvi10, fun=function(x,y) x / y, filename='cvndvi10.grd', overwrite=TRUE)

f <- filename(avgndvi10)
rm(avgndvi10, desviondvi10_2, sdndvi10)
file.remove(c(f, extension(f, '.gri')))
file.remove(c('over_tmp.grd', 'over_tmp.gri', 'calc_tmp.grd', 'calc_tmp.gri'))

This would require much less space both in terms of RAM and drive:

[...]
nl <- nlayers(filesetndvi10)
avgndvi10 <- mean(filesetndvi10)

desviondvi10_2 <- overlay(filesetndvi10, avgndvi10, fun=function(x, y) (x - y)^2 , 
filename='over_tmp.grd')
rm(avgndvi10)
file.remove(c('over_tmp.grd', 'over_tmp.gri'))

sdndvi10 <- calc(desviondvi10_2, fun=function(x) sqrt(sum(x) / nl), filename='calc_tmp.grd')
rm(desviondvi10_2)
file.remove(c('calc_tmp.grd', 'calc_tmp.gri'))

cvndvi10  <- overlay(xsdndvi10, avgndvi10, fun=function(x,y) x / y, filename='cvndvi10.grd', overwrite=TRUE)
rm(sdndvi10)
file.remove(c('cvndvi10.grd', 'cvndvi10.gri'))


标签: r raster