tm_map has parallel::mclapply error in R 3.0.1 on

I am using R 3.0.1 on Platform: x86_64-apple-darwin10.8.0 (64-bit)

I am trying to use tm_map from the tm library. But when I execute the this code

library(tm)
data('crude')
tm_map(crude, stemDocument)

I get this error:

Warning message:
In parallel::mclapply(x, FUN, ...) :
  all scheduled cores encountered errors in user code

Does anyone know a solution for this?

标签： r parallel-processing tm mclapply

7条回答

再贱就再见

2楼-- · 2019-01-14 22:19

I have been facing same issue but finally got it fixed. My guess is that if I name the corpus as "longName" or "companyNewsCorpus", I get the issue but if I use corpus value as "a", it works well. Really weird.

Below code gives same error message mentioned in this thread

companyNewsCorpus  <-Corpus(DirSource("SourceDirectory"),
                            readerControl = list(language="english"))
companyNewsCorpus <- tm_map(companyNewsCorpus, 
                            removeWords, stopwords("english"))

But if I convert this in below, it works without issues.

a  <-Corpus(DirSource("SourceDirectory"), 
            readerControl = list(language="english"))
a <- tm_map(a, removeWords, stopwords("english"))

0人赞添加讨论(0) 举报

The star\"

3楼-- · 2019-01-14 22:20

I also ran into this same issue while using the tm library's removeWords function. Some of the other answers such as setting the number of cores to 1 did work for removing the set of English stop words, however I wanted to also remove a custom list of first names and surnames from my corpus, and these lists were upwards of 100,000 words long each.

None of the other suggestions would help this issue and it turns out that through some trial and error that removeWords seemed to have a limitation of 1000 words in a vector. So to I wrote this function that solved the issue for me:

# Let x be a corpus
# Let y be a vector containing words to remove
removeManyWords <- function (x, y) {

      n <- ceiling(length(y)/1000)
      s <- 1
      e <- 1000

      for (i in 1:n) {

            x <- tm_map(x, content_transformer(removeWords), y[s:e])
            s <- s + 1000
            e <- e + 1000

      }

      x

 }

This function essentially counts how many words are in the vector of words I want to remove, and then divides it by 1000 and rounds up to the nearest whole number, n. We then loop through the vector of words to remove n times. With this method I didn't need to use lazy = TRUE or change the number of cores to use as can be seen from the actual removeWords call in the function. Hope this helps!

0人赞添加讨论(0) 举报

叛逆

4楼-- · 2019-01-14 22:25

I found an answer to this that was successful for me in this question: Charles Copley, in his answer, indicates he thinks the new tm package requires lazy = TRUE to be explicitly defined.

So, your code would look like this

library(tm)
data('crude')
tm_map(crude, stemDocument, lazy = TRUE)

I also tried it without SnowballC to see if it was a combination of those two answers. It did not appear to affect the result either way.

0人赞添加讨论(0) 举报

劫难

5楼-- · 2019-01-14 22:34

I was working on Twitter data and got the same error in the original question while I was trying to convert all text to lower with tm_map() function

Warning message: In parallel::mclapply(x, FUN, ...) :   
all scheduled cores encountered errors in user code

Installing and loading package SnowballC resolved the problem completely. Hope this helps.

0人赞添加讨论(0) 举报

迷人小祖宗

6楼-- · 2019-01-14 22:35

I just ran into this. It took me a bit of digging but I found out what was happening.

I had a line of code 'rdevel <- tm_map(rdevel, asPlainTextDocument)'
Running this produced the error


    In parallel::mclapply(x, FUN, ...) :
      all scheduled cores encountered errors in user code

It turns out that 'tm_map' calls some code in 'parallel' which attempts to figure out how many cores you have. To see what it's thinking, type


    > getOption("mc.cores", 2L)
    [1] 2
    >

Aha moment! Tell the 'tm_map' call to only use one core!


    > rdevel <- tm_map(rdevel, asPlainTextDocument, mc.cores=1)
    Error in match.fun(FUN) : object 'asPlainTextDocument' not found
    > rdevel <- tm_map(rdevel, asPlainTextDocument, mc.cores=4)
    Warning message:
    In parallel::mclapply(x, FUN, ...) :
      all scheduled cores encountered errors in user code
    >

So ... with more than one core, rather than give you the error message, 'parallel' just tells you there was an error in each core. Not helpful, parallel! I forgot the dot - the function name is supposed to be 'as.PlainTextDocument'!

So - if you get this error, add 'mc.cores=1' to the 'tm_map' call and run it again.

0人赞添加讨论(0) 举报

混吃等死

7楼-- · 2019-01-14 22:38

I suspect you don't have the SnowballC package installed, which seems to be required. tm_map is supposed to run stemDocument on all the documents using mclapply. Try just running the stemDocument function on one document, so you can extract the error:

stemDocument(crude[[1]])

For me, I got an error:

Error in loadNamespace(name) : there is no package called ‘SnowballC’

So I just went ahead and installed SnowballC and it worked. Clearly, SnowballC should be a dependency.

0人赞添加讨论(0) 举报

1 2 下一页

tm_map has parallel::mclapply error in R 3.0.1 on

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间