I am getting Null pointer exception when broadcasting a Dataframe and trying to access them in a Spark UDF.
UDF definition-
def test_udf(parm1: String, parm2: String, paarm3: String, ) = {
println ("Inside UDF ")
B.value.take(1).foreach { println }
println("after print")
..... ....... }
> sqlContext.udf.register("test_udf", test_udf _)
Broadcasting-
val B = sc.broadcast(sqlContext.sql("""Select * FROM table_a where col1='10102'""")) // Returns almost 20 MB data
Accessing UDF-
val df = sqlContext.sql("SELECT test_udf(parm1,parm2,parm3) AS test FROM table_b").take(1)
After this line i am getting null pointer exception in UDF at below line B.value.take(1).foreach { println }
I am suspecting that Broadcast is not happening correctly. Is it something wrong in this code? Using Spark 1.6.1