I have a spark DataFrame which is grouped by a column aggregated with a count:
df.groupBy('a').agg(count("a")).show
+---------+----------------+
|a |count(a) |
+---------+----------------+
| null| 0|
| -90| 45684|
+---------+----------------+
df.select('a').filter('aisNull').count
returns
warning: there was one feature warning; re-run with -feature for details
res9: Long = 26834
which clearly shows that the null values were not counted initially.
What is the reason for this behaviour? I would have expected (if null
at all is contained in the grouping result) to properly see the counts.
SQL-92 standard. In particular (emphasis mine):
Yes,
count
applied to a specific column does not count the null-values. If you want to include the null-values, use: