I have a dataframe with a column of arraytype that can contain integer values. If no values it will contain only one and it will be the null value
Important: note the column will not be null but an array with a single value; null
> val df: DataFrame = Seq(("foo", Seq(Some(2), Some(3))), ("bar", Seq(None))).toDF("k", "v")
df: org.apache.spark.sql.DataFrame = [k: string, v: array<int>]
> df.show()
+---+------+
| k| v|
+---+------+
|foo|[2, 3]|
|bar|[null]|
Question: I'd like to get the rows that have the null value.
Thanks for your help
What I have tried thus far:
> df.filter(array_contains(df("v"), 2)).show()
+---+------+
| k| v|
+---+------+
|foo|[2, 3]|
+---+------+
for null, it does not seem to work
> df.filter(array_contains(df("v"), null)).show()
org.apache.spark.sql.AnalysisException: cannot resolve 'array_contains(`v`, NULL)' due to data type mismatch: Null typed values cannot be used as arguments;
or
> df.filter(array_contains(df("v"), None)).show()
java.lang.RuntimeException: Unsupported literal type class scala.None$ None