Using parLapply and clusterExport inside a functio

I asked a related question here and the response worked well: using parallel's parLapply: unable to access variables within parallel code

The problem is when I try to use the answer inside of the function it won't work as I think it has to the default environment of clusterExport. I've read the vignette and looked at the help file but am approaching this with a very limited knowledge base. The way I used parLapply I expected it to behave similar to lapply but it doesn't appear to.

Here is my attempt:

par.test <- function(text.var, gc.rate=10){ 
    ntv <- length(text.var)
    require(parallel)
    pos <-  function(i) {
        paste(sapply(strsplit(tolower(i), " "), nchar), collapse=" | ")
    }
    cl <- makeCluster(mc <- getOption("cl.cores", 4))
    clusterExport(cl=cl, varlist=c("text.var", "ntv", "gc.rate", "pos"))
    parLapply(cl, seq_len(ntv), function(i) {
            x <- pos(text.var[i])
            if (i%%gc.rate==0) gc()
            return(x)
        }
    )
}

par.test(rep("I like cake and ice cream so much!", 20))

#gives this error message
> par.test(rep("I like cake and ice cream so much!", 20))
Error in get(name, envir = envir) : object 'text.var' not found

标签： r parallel-processing

2条回答

欢心

2楼-- · 2019-01-21 14:23

By default clusterExport looks in the .GlobalEnv for objects to export that are named in varlist. If your objects are not in the .GlobalEnv, you must tell clusterExport in which environment it can find those objects.

You can change your clusterExport to the following (which I didn't test, but you said works in the comments)

clusterExport(cl=cl, varlist=c("text.var", "ntv", "gc.rate", "pos"), envir=environment())

This way, it will look in the function's environment for the objects to export.

0人赞添加讨论(0) 举报

家丑人穷心不美

3楼-- · 2019-01-21 14:23

Another solution is to include the additional variables as arguments to your function; parLapply exports them too. If 'text.var' is the big data, then it pays to make it the argument that is applied to, rather than an index, because then only the portion of text.var relevant to each worker is exported, rather than the whole object to each worker.

par.test <- function(text.var, gc.rate=10){ 
    require(parallel)
    pos <-  function(i) {
        paste(sapply(strsplit(tolower(i), " "), nchar), collapse=" | ")
    }
    cl <- makeCluster(mc <- getOption("cl.cores", 4))
    parLapply(cl, text.var, function(text.vari, gc.rate, pos) {
        x <- pos(text.vari)
        if (i%%gc.rate==0) gc()
        x
    }, gc.rate, pos)
}

This is also conceptually pleasing. (It's rarely necessary to explicitly invoke the garbage collector).

0人赞添加讨论(0) 举报

Using parLapply and clusterExport inside a functio

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间