Spark Scala GraphX: Calling Shortest Path within a

2019-09-04 10:41发布

I'm having an issue in my code where I'm recieving a null pointer exception runtime error when mapping a function that calls shortest path on a global graph variable. For some reason, even though initializing distance in the terminal regularly throws no error, and calling testF() normally works as well, it doesn't work when its getting mapped. When i remove the eroneous distance call inside the testF function, the example works fine. Does anyone know why this is happening?

val testG = Graph.fromEdges[Int, Int](sc.parallelize(List(Edge(1, 2, 1), Edge(2, 3, 1))), 0)
val testRDD = sc.parallelize(List(1, 2, 3, 4))
def testF() : Int = {
     val distances = ShortestPaths.run(testG, Seq(15134567L))
     return 5
}
testF() //works fine and returns 5
val testR = testRDD.map{case(num) => (num, test())}
testR.take(10).foreach(println) //gives a null pointer error

1条回答
老娘就宠你
2楼-- · 2019-09-04 11:08

As @DanieldePaula alluded to - you can not nest the distributed methods within the RDD's. Instead the logic within the ShortestPaths.run would need to be extracted and reformulated as straight scala code - and without any mention of sc (SparkContext) methods, SparkJob, or any other Driver-only mechanisms. You need to stick with serializable and Worker-compatible logic.

查看更多
登录 后发表回答