Spark Scala GraphX: Calling Shortest Path within a

2019-09-04 10:57发布

问题:

I'm having an issue in my code where I'm recieving a null pointer exception runtime error when mapping a function that calls shortest path on a global graph variable. For some reason, even though initializing distance in the terminal regularly throws no error, and calling testF() normally works as well, it doesn't work when its getting mapped. When i remove the eroneous distance call inside the testF function, the example works fine. Does anyone know why this is happening?

val testG = Graph.fromEdges[Int, Int](sc.parallelize(List(Edge(1, 2, 1), Edge(2, 3, 1))), 0)
val testRDD = sc.parallelize(List(1, 2, 3, 4))
def testF() : Int = {
     val distances = ShortestPaths.run(testG, Seq(15134567L))
     return 5
}
testF() //works fine and returns 5
val testR = testRDD.map{case(num) => (num, test())}
testR.take(10).foreach(println) //gives a null pointer error

回答1:

As @DanieldePaula alluded to - you can not nest the distributed methods within the RDD's. Instead the logic within the ShortestPaths.run would need to be extracted and reformulated as straight scala code - and without any mention of sc (SparkContext) methods, SparkJob, or any other Driver-only mechanisms. You need to stick with serializable and Worker-compatible logic.