I have a DataFrame
for a table in SQL. I want to filter this DataFrame
if a value of a certain column is numeric or not.
val df = sqlContext.sql("select * from myTable");
val filter = df.filter("ISNUMERIC('col_a')");
//I want filter to be a dataframe of df where the values in col_a are numeric.
My current solution doesn't work. How can I achieve this?
You can filter
out as
df.filter(row => row.getAs[String]("col_a").matches("""\d+"""))
Hope this helps!
You can cast the field in question to DECIMAL
and inspect the result:
filter("CAST(col_a AS DECIMAL) IS NOT NULL")
Optionally, you can pass length and/or precision to narrow down the valid numbers to a specific maximum length:
filter("CAST(col_a AS DECIMAL(18,8)) IS NOT NULL")
Shankar Koirala's answer covers integers effectively. The regex below would cover use cases requiring doubles, with optional negative signing and handling of nulls (note that this is a Java variation):
df.filter( df.col("col_a").isNotNull() )
.filter( ( FilterFunction<Row> )
row -> row.getString( row.fieldIndex( "col_a" ) ).matches( "-?\\d+\\.?\\d*" ) )
spark.sql("select phone_number, (CASE WHEN LENGTH(REGEXP_REPLACE(phone_number),'[^0-9]', '')) = LENGTH(TRIM(phone_number)) THEN true ELSE false END) as phone_number_isNumeric from table").show()
This is really an old post, but still if anybody looking for alternate solution.
REGEXP_REPLACE(phone_number),'[^0-9]', ''
removes all characters except numeric
you can also use spark udf.
Refer below url for more details -
Need the equivalent of SQL IsNumeric function in spark sql