example sample data
Si K Ca Ba Fe Type
71.78 0.06 8.75 0 0 1
72.73 0.48 7.83 0 0 1
72.99 0.39 7.78 0 0 1
72.61 0.57 na 0 0 na
73.08 0.55 8.07 0 0 1
72.97 0.64 8.07 0 na 1
73.09 na 8.17 0 0 1
73.24 0.57 8.24 0 0 1
72.08 0.56 8.3 0 0 1
72.99 0.57 8.4 0 0.11 1
na 0.67 8.09 0 0.24 1
we can load data into sparklyr
with the following code
sdf_copy_to(sc,sampledata)
I am looking for a query that returns the columns having NA values for example like
si k ca fe
1 1 1 2
This problem is actually a bit tricky due to
tbl_spark
implementation and incompatibilities in Spark and R semantics. Even if could applycolSums
, Spark SQL doesn't allow implicit conversions between booleans and numerics. This means you have to explicitly applyas.numeric
: