Consider the following DataFrame
root
|-- values: array (nullable = true)
| |-- element: double (containsNull = true)
with content:
+-----------+
| values|
+-----------+
|[1.0, null]|
+-----------+
Now I want to pass thie value
column to an UDF:
val inspect = udf((data:Seq[Double]) => {
data.foreach(println)
println()
data.foreach(d => println(d))
println()
data.foreach(d => println(d==null))
""
})
df.withColumn("dummy",inspect($"values"))
I'm really confused from the output of the above println
statements:
1.0
null
1.0
0.0
false
false
My questions:
- Why is
foreach(println)
not giving the same output asforeach(d=>println(d))
? - How can the
Double
be null in the first println-statement, I thought scala'sDouble
cannot be null? - How can I filter null values in my
Seq
other han filtering0.0
which isnt really safe? Should I useSeq[java.lang.Double]
as type for my input in the UDF and then filter nulls? (this works, but I'm unsure if that is the way to go)
Note that I'm aware of this Question, but my question is specific to array-type columns.