Possible to filter Spark dataframe by ISNUMERIC fu

2020-06-23 08:12发布

问题:

I have a DataFrame for a table in SQL. I want to filter this DataFrame if a value of a certain column is numeric or not.

val df = sqlContext.sql("select * from myTable");
val filter = df.filter("ISNUMERIC('col_a')");
//I want filter to be a dataframe of df where the values in col_a are numeric.

My current solution doesn't work. How can I achieve this?

回答1:

You can filter out as

df.filter(row => row.getAs[String]("col_a").matches("""\d+"""))

Hope this helps!



回答2:

You can cast the field in question to DECIMAL and inspect the result:

filter("CAST(col_a AS DECIMAL) IS NOT NULL")

Optionally, you can pass length and/or precision to narrow down the valid numbers to a specific maximum length:

filter("CAST(col_a AS DECIMAL(18,8)) IS NOT NULL")


回答3:

Shankar Koirala's answer covers integers effectively. The regex below would cover use cases requiring doubles, with optional negative signing and handling of nulls (note that this is a Java variation):

df.filter( df.col("col_a").isNotNull() )
  .filter( ( FilterFunction<Row> )
      row -> row.getString( row.fieldIndex( "col_a" ) ).matches( "-?\\d+\\.?\\d*" ) )


回答4:

spark.sql("select phone_number, (CASE WHEN LENGTH(REGEXP_REPLACE(phone_number),'[^0-9]', '')) = LENGTH(TRIM(phone_number)) THEN true ELSE false END) as phone_number_isNumeric from table").show()

This is really an old post, but still if anybody looking for alternate solution.

REGEXP_REPLACE(phone_number),'[^0-9]', '' 

removes all characters except numeric



回答5:

you can also use spark udf.

Refer below url for more details -

Need the equivalent of SQL IsNumeric function in spark sql