The following code raises the NullPointerException. Even there is Option(x._1.F2).isDefined && Option(x._2.F2).isDefined to prevent the null values?

case class Cols (F1: String, F2: BigDecimal, F3: Int, F4: Date, ...)

def readTable() : DataSet[Cols] = {
    import sqlContext.sparkSession.implicits._

    sqlContext.read.format("jdbc").options(Map(
      "driver" -> "com.microsoft.sqlserver.jdbc.SQLServerDriver",
      "url" -> jdbcSqlConn,
      "dbtable" -> s"..."
    )).load()
      .select("F1", "F2", "F3", "F4")
      .as[Cols]
  }

import org.apache.spark.sql.{functions => func}
val j = readTable().joinWith(readTable(), func.lit(true))
readTable().filter(x => 
  (if (Option(x._1.F2).isDefined && Option(x._2.F2).isDefined 
       && (x._1.F2- x._2.F2< 1)) 1 else 0)  //line 51
  + ..... > 100)

I tried !(x._1.F2== null || x._2.F2== null) and it still gets the exception.

The exception is

java.lang.NullPointerException
        at scala.math.BigDecimal.$minus(BigDecimal.scala:563)
        at MappingPoint$$anonfun$compare$1.apply(MappingPoint.scala:51)
        at MappingPoint$$anonfun$compare$1.apply(MappingPoint.scala:44)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

Update: I tried the following expression and the execution still hit the line x._1.F2- x._2.F2. Is it a way to check if BigDecimal is null?

(if (!(Option(x._1.F2).isDefined && Option(x._2.F2).isDefined
       && x._1.F2!= null && x._2.F2!= null)) 0
       else (if (x._1.F2- x._2.F2< 1) 1 else 0))

Update 2

The exception is gone after I wrapped the minus into (math.abs((l.F2 - r.F2).toDouble). Why?

回答1:

Try adding this this to your if statement:

&& (x._1.F2 && x._2.F2) != null

I've had a similar issue in Java and that's what has worked for me.

回答2:

Looking at the source code for BigDecimal, on line 563: https://github.com/scala/scala/blob/v2.11.8/src/library/scala/math/BigDecimal.scala#L563

It may be possible that x._1.F2.bigDecimal or x._2.F2.bigDecimal is null, though I'm not really sure how that would happen, given the constructor checks for that. But maybe check for null there and see if that solves the problem?

BTW, you should really avoid all the ._1, ._2s... You should be able to do something like: