freeing up memory in R

In R, I am trying to combine and convert several sets of timeseries data as an xts from http://www.truefx.com/?page=downloads however, the files are large and there many files so this is causing me issues on my laptop. They are stored as a csv file which have been compressed as a zip file.

Downloading them and unzipping them is easy enough (although takes up a lot of space on a hard drive).

Loading the 350MB+ files for one month's worth of data into the R is reasonably straight forward with the new fread() function in the data.table package.

Some datatable transformations are done (inside a function) so that the timestamps can be read easily and a mid column is produced. Then the datatable is saved as an RData file on the hard drive, and all references are to the datatable object are removed from the workspace, and a gc() is run after removal...however when looking at the R session in my Activity Monitor (run from a Mac)...it still looks like it is taking up almost 1GB of RAM...and things seem a bit laggy...I was intending to load several years worth of the csv files at the same time, convert them to useable datatables, combine them and then create a single xts object, which seems infeasible if just one month uses 1GB of RAM.

I know I can sequentially download each file, convert it, save it shut down R and repeat until i have a bunch of RData files that i can just load and bind, but was hopeing there might be a more efficient manner to do this so that after removing all references to a datatable you get back not "normal" or at startup levels of RAM usage. Are there better ways of clearing memory than gc()? Any suggestions would be greatly appreciated.

标签： r garbage-collection

1条回答

啃猪蹄的小仙女

2楼-- · 2019-04-08 00:42

In my project I had to deal with many large files. I organized the routine on the following principles:

Isolate memory-hungry operations in separate R scripts.
Run each script in new process which is destroyed after execution. Thus system gives used memory back.
Pass parameters to the scripts via text file.

Consider the toy example below.

Data generation:

setwd("/path/to")
write.table(matrix(1:5e7, ncol=10), "temp.csv") # 465.2 Mb file

slave.R - memory consuming part

setwd("/path/to")
library(data.table)

# simple processing
f <- function(dt){
  dt <- dt[1:nrow(dt),]
  dt[,new.row:=1]
  return (dt)
}

# reads parameters from file
csv <- read.table("io.csv")
infile  <- as.character(csv[1,1])
outfile <- as.character(csv[2,1])

# memory-hungry operations
dt <- as.data.table(read.csv(infile))
dt <- f(dt)
write.table(dt, outfile)

master.R - executes slaves in separate processes

setwd("/path/to")

# 3 files processing
for(i in 1:3){
  # sets iteration-specific parameters
  csv <- c("temp.csv", paste("temp", i, ".csv", sep=""))
  write.table(csv, "io.csv")

  # executes slave process
  system("R -f slave.R")
}

0人赞添加讨论(0) 举报

freeing up memory in R

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间