The following code raises the NullPointerException. Even there is Option(x._1.F2).isDefined && Option(x._2.F2).isDefined
to prevent the null values?
case class Cols (F1: String, F2: BigDecimal, F3: Int, F4: Date, ...)
def readTable() : DataSet[Cols] = {
import sqlContext.sparkSession.implicits._
sqlContext.read.format("jdbc").options(Map(
"driver" -> "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"url" -> jdbcSqlConn,
"dbtable" -> s"..."
)).load()
.select("F1", "F2", "F3", "F4")
.as[Cols]
}
import org.apache.spark.sql.{functions => func}
val j = readTable().joinWith(readTable(), func.lit(true))
readTable().filter(x =>
(if (Option(x._1.F2).isDefined && Option(x._2.F2).isDefined
&& (x._1.F2- x._2.F2< 1)) 1 else 0) //line 51
+ ..... > 100)
I tried !(x._1.F2== null || x._2.F2== null)
and it still gets the exception.
The exception is
java.lang.NullPointerException at scala.math.BigDecimal.$minus(BigDecimal.scala:563) at MappingPoint$$anonfun$compare$1.apply(MappingPoint.scala:51) at MappingPoint$$anonfun$compare$1.apply(MappingPoint.scala:44) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)
Update:
I tried the following expression and the execution still hit the line x._1.F2- x._2.F2
. Is it a way to check if BigDecimal is null?
(if (!(Option(x._1.F2).isDefined && Option(x._2.F2).isDefined
&& x._1.F2!= null && x._2.F2!= null)) 0
else (if (x._1.F2- x._2.F2< 1) 1 else 0))
Update 2
The exception is gone after I wrapped the minus into (math.abs((l.F2 - r.F2).toDouble)
.
Why?