I'm dealing with different Spark DataFrames
, which have lot of Null values in many columns. I want to get any one non-null value from each of the column to see if that value can be converted to datetime.
I tried doing df.na.drop().first()
in a hope that it'll drop all rows with any null value, and of the remaining DataFrame
, I'll just get the first row with all non-null values. But many of the DataFrames
have so many columns with lot of null values, that df.na.drop()
returns empty DataFrame
.
I also tried finding if any columns has all null
values, so that I could simply drop that columns before trying the above approach, but that still not solved the problem. Any idea how can I accomplish this in efficient way, as this code will be run many times on huge DataFrames
?