Spark Task not Serializable with simple accumulato

2019-07-03 22:59发布

I am running this simple code:

val accum = sc.accumulator(0, "Progress");
listFilesPar.foreach {
  filepath =>
    accum += 1
}

listFilesPar is an RDD[String]

which throws the following error:

org.apache.spark.SparkException: Task not serializable

Right now I don't understand what's happening and I don't put parenthesis but brackets because I need to write a lengthy function. I am just doing unit testing

1条回答
男人必须洒脱
2楼-- · 2019-07-04 00:01

The typical cause of this is that the closure unexpectedly captures something. Something that you did not include in your paste, because you would never expect it would be serialized.

You can try to reduce your code until you find it. Or just turn on serialization debug logging with -Dsun.io.serialization.extendedDebugInfo=true. You will probably see in the output that Spark tries to serialize something silly.

查看更多
登录 后发表回答