Hi I have the following issue:
numeric.registerTempTable("numeric").
All the values that I want to filter on are literal null strings and not N/A or Null values.
I tried these three options:
numeric_filtered = numeric.filter(numeric['LOW'] != 'null').filter(numeric['HIGH'] != 'null').filter(numeric['NORMAL'] != 'null')
numeric_filtered = numeric.filter(numeric['LOW'] != 'null' AND numeric['HIGH'] != 'null' AND numeric['NORMAL'] != 'null')
sqlContext.sql("SELECT * from numeric WHERE LOW != 'null' AND HIGH != 'null' AND NORMAL != 'null'")
Unfortunately, numeric_filtered is always empty. I checked and numeric has data that should be filtered based on these conditions.
Here are some sample values:
Low High Normal
3.5 5.0 null
2.0 14.0 null
null 38.0 null
null null null
1.0 null 4.0
Your are using logical conjunction (AND). It means that all columns have to be different than
'null'
for row to be included. Lets illustrate that usingfilter
version as an example:All remaining methods you've tried follow exactly the same schema. What you need here is a logical disjunction (OR).
or with raw SQL:
See also: Pyspark: multiple conditions in when clause