I have the below query which is supposed to find an average of the column values and return me the result which is a single number.
val avgVal = hiveContext.sql("select round(avg(amount), 4) from users.payment where dt between '2018-05-09' and '2018-05-09'").first().getDouble(0)
I'm facing inconsistent behavior at this statement. This often fails with below error however it gives non-NULL results when executed through Hive."
18/05/10 11:01:12 ERROR ApplicationMaster: User class threw exception: java.lang.NullPointerException: Value at index 0 in null
java.lang.NullPointerException: Value at index 0 in null
at org.apache.spark.sql.Row$class.getAnyValAs(Row.scala:475)
at org.apache.spark.sql.Row$class.getDouble(Row.scala:243)
at org.apache.spark.sql.catalyst.expressions.GenericRow.getDouble(rows.scala:192)
The reason why I use HiveContext instead of SQLContext is that the later doesn't support some of the aggregation functions which I use extensively in my code.
Could you please help me understand why this problem occurs and how to solve?
You need to divide query and get into two parts:
This would avoid NPE. This would also be needed in List and array.
For insert or update query, you even need to surround with
try...catch
block to catch runtime exception.Let's analyze the case and possible reasons when this exception can be thrown.
If the value of the
row
above returned something like:Trying to get
getDouble(0)
will lead tojava.lang.NullPointerException: Value at index 0 in null
You can try the following approach:
If you'll check the sources, the library class doing opposite: