I'm having an issue in my code where I'm recieving a null pointer exception runtime error when mapping a function that calls shortest path on a global graph variable. For some reason, even though initializing distance in the terminal regularly throws no error, and calling testF() normally works as well, it doesn't work when its getting mapped. When i remove the eroneous distance call inside the testF function, the example works fine. Does anyone know why this is happening?
val testG = Graph.fromEdges[Int, Int](sc.parallelize(List(Edge(1, 2, 1), Edge(2, 3, 1))), 0)
val testRDD = sc.parallelize(List(1, 2, 3, 4))
def testF() : Int = {
val distances = ShortestPaths.run(testG, Seq(15134567L))
return 5
}
testF() //works fine and returns 5
val testR = testRDD.map{case(num) => (num, test())}
testR.take(10).foreach(println) //gives a null pointer error
As @DanieldePaula alluded to - you can not nest the distributed methods within the RDD's. Instead the logic within the
ShortestPaths.run
would need to be extracted and reformulated as straight scala code - and without any mention ofsc
(SparkContext
) methods,SparkJob
, or any other Driver-only mechanisms. You need to stick with serializable and Worker-compatible logic.