Spark SQL exception handling

2020-05-24 06:28发布

In order to handle Spark exception on RDD operations I can use the following approach with additional exceptions column:

val df: DataFrame = ...

val rddWithExcep = df.rdd.map { row: Row =>
  val memberIdStr = row.getAs[String]("member_id")
  val memberIdInt = Try(memberIdStr.toInt) match {
    case Success(integer) => List(integer, null)
    case Failure(ex) => List(null, ex.toString)
  }
  Row.fromSeq(row.toSeq.toList ++ memberIdInt)
}

val castWithExceptionSchema = StructType(df.schema.fields ++ Array(StructField("member_id_int", IntegerType, true)
  , StructField("exceptions", StringType, true)))

val castExcepDf = sparkSession.sqlContext.createDataFrame(rddWithExcep, castWithExceptionSchema)

castExcepDf.printSchema()
castExcepDf.show()

Is it possible to handle such exception on Spark SQL? For example, currently in case of any errors, Spark SQL simply returns null value and hides the error.

For example division by 0 will be resulted into null value and not into an error.. In my opinion - this is a very serious issue in Spark SQL because it can simple produces unexpected/wrong data that you won't even notice.

Is it possible to override this behavior and let Spark fail with an appropriate detailed exception?

标签： scala dataframe apache-spark error-handling apache-spark-sql

0条回答

Spark SQL exception handling

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间