Cannot allocate memory even after deleting large o

2019-08-20 16:56发布

问题:

I'm encountering an intermittent error message in R after using parallel::dopar.

I use a dopar loop once after starting a fresh session and everything works. My script runs, takes a corpus, transforms it and outputs a matrix.

If I rm(list = ls()) and closeAllConnections() I am unable to do anything after that without hitting a memory error.

Tried to run the function again with slightly altered parameters gave "Error in mcfork() : unable to fork, possible reason: Cannot allocate memory" but then if I try to do ANYTHING else I get an error. I tried to type sessionInfo() which gave "Error in system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE) : cannot popen '/usr/bin/which 'uname' 2>/dev/null', probable reason 'Cannot allocate memory'"

I tried opening the shell in R studio (hosted R studio tools > shell) which gives a popup "cannot allocate memory".

I tried entering stopImplicitCluster() after my dopar loop as well as closeAllConnections()

I don't know where to look next? Does this sound familiar to anyone?

I noticed in the terminal top > 1 where I see each core that all my cores are at 100% sleeping but I'm not sure what that means. Here's a screen shot:

Not sure what other information to provide?

This is the script that runs perfectly fine once in a fresh session, then seems to leave me with no memory.

clean_corpus <- function(corpus, n = 1000) { # n is length of each peice in parallel processing

  # split the corpus into pieces for looping to get around memory issues with transformation
  nr <- length(corpus)
  pieces <- split(corpus, rep(1:ceiling(nr/n), each=n, length.out=nr))
  lenp <- length(pieces)

  rm(corpus) # save memory

  # save pieces to rds files since not enough RAM
  tmpfile <- tempfile() 
  for (i in seq_len(lenp)) {
    saveRDS(pieces[[i]],
            paste0(tmpfile, i, ".rds"))
  }

  rm(pieces) # save memory since now these are saved in tmp rds files

  # doparallel
  registerDoParallel(cores = 12)
  pieces <- foreach(i = seq_len(lenp)) %dopar% {
    # update spelling
    piece <- readRDS(paste0(tmpfile, i, ".rds"))
    # spelling update based on lut
    piece <- tm_map(piece, function(i) stringi_spelling_update(i, spellingdoc))
    # regular transformations
    piece <- tm_map(piece, removeNumbers)
    piece <- tm_map(piece, content_transformer(removePunctuation), preserve_intra_word_dashes = T)
    piece <- tm_map(piece, content_transformer(function(x, ...) 
      qdap::rm_stopwords(x, stopwords = tm::stopwords("english"), separate = F)))
    saveRDS(piece, paste0(tmpfile, i, ".rds"))
    return(1) # hack to get dopar to forget the piece to save memory since now saved to rds
  } 

  stopImplicitCluster() # I added this but according to documentation I don;t think I need to since implicit clusters are closed automatically by doparallel?

  # combine the pieces back into one corpus
  corpus <- list()
  corpus <- foreach(i = seq_len(lenp)) %do% {
    corpus[[i]] <- readRDS(paste0(tmpfile, i, ".rds"))
  }
  corpus <- do.call(function(...) c(..., recursive = TRUE), corpus)
  return(corpus)

} # end clean_corpus function
标签: r doparallel