Getting Null pointer exception while accessing Bro

2019-07-30 03:10发布

问题:

I am getting Null pointer exception when broadcasting a Dataframe and trying to access them in a Spark UDF.

UDF definition-

def test_udf(parm1: String,  parm2: String,  paarm3: String,  ) = {
println ("Inside UDF ")           
B.value.take(1).foreach { println }
println("after print") 

..... ....... }

> sqlContext.udf.register("test_udf", test_udf _)

Broadcasting-

val B = sc.broadcast(sqlContext.sql("""Select * FROM table_a where col1='10102'""")) // Returns almost 20 MB data

Accessing UDF-

val df = sqlContext.sql("SELECT test_udf(parm1,parm2,parm3) AS test FROM table_b").take(1)

After this line i am getting null pointer exception in UDF at below line B.value.take(1).foreach { println }

I am suspecting that Broadcast is not happening correctly. Is it something wrong in this code? Using Spark 1.6.1

回答1:

You get an exception because it is not a valid Spark program:

  • broadcasting DataFrame object is not a meaningful operation. This is why we have broadcast join hints.
  • Spark doesn't support nested operations on distributed data structure. In other words you cannot access DataFrame inside an UDF.