-->

Spark Task not Serializable with simple accumulato

2019-07-03 23:07发布

问题:

I am running this simple code:

val accum = sc.accumulator(0, "Progress");
listFilesPar.foreach {
  filepath =>
    accum += 1
}

listFilesPar is an RDD[String]

which throws the following error:

org.apache.spark.SparkException: Task not serializable

Right now I don't understand what's happening and I don't put parenthesis but brackets because I need to write a lengthy function. I am just doing unit testing

回答1:

The typical cause of this is that the closure unexpectedly captures something. Something that you did not include in your paste, because you would never expect it would be serialized.

You can try to reduce your code until you find it. Or just turn on serialization debug logging with -Dsun.io.serialization.extendedDebugInfo=true. You will probably see in the output that Spark tries to serialize something silly.