I'm writing a custom Spark RDD implementation in Scala, and I'm debugging my implementation using the Spark shell. My goal for now is to get:
customRDD.count
to succeed without an Exception. Right now this is what I'm getting:
15/03/06 23:02:32 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/03/06 23:02:32 ERROR TaskSetManager: Failed to serialize task 0, not attempting to retry it.
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.serializer.SerializationDebugger$ObjectStreamClassMethods$.getObjFieldValues$extension(SerializationDebugger.scala:240)
...
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at java.io.ObjectStreamClass$FieldReflector.getObjFieldValues(ObjectStreamClass.java:2050)
at java.io.ObjectStreamClass.getObjFieldValues(ObjectStreamClass.java:1252)
... 45 more
The "failed to serialize task 0" catches my attention. I don't have an outstanding mental picture of what's going on I do customRDD.count
, and it's very unclear exactly what could not be serialized.
My custom RDD consists of:
- custom RDD class
- custom Partition class
- custom (scala) Iterator class
My Spark shell session looks like this:
import custom.rdd.stuff
import org.apache.spark.SparkContext
val conf = sc.getConf
conf.set(custom, parameters)
sc.stop
sc2 = new SparkContext(conf)
val mapOfThings: Map[String, String] = ...
myRdd = customRDD(sc2, mapOfStuff)
myRdd.count
... (exception output) ...
What I'd like to know is:
- For the purposes of creating a custom RDD class, what needs to be "serializable"?
- What does it mean to be "serializable", as far as Spark is concerned? Is this akin to Java's "Serializable"?
- Do all data returned from my RDD's Iterator (returned by the
compute
method) also need to be serializable?
Thank you so much for any clarification on this issue.