SQL-Functions with schemaRDD using language integr

2019-08-02 03:26发布

问题:

I want to filter a schemaRDD using language integrated SQL based on SQL functions. For example I want to run

SELECT name FROM people WHERE name LIKE '%AHSAN%' AND name regexp '^[A-Z]{20}$'

How can I use such SQL functions in people.where()?

Reference:

For language integrated SQL, I am following the example given here.

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
val people: RDD[Person] = ... // An RDD of case class objects, from the first example.
// The following is the same as 'SELECT name FROM people WHERE age >= 10 AND age <= 19'
val teenagers = people.where('age >= 10).where('age <= 19).select('name)
teenagers.map(t => "Name: " + t(0)).collect().foreach(println)

Thanks in advance!

回答1:

You can use SQL functions like numeric operators. E.g.,

people.where('name like "%AHSAN%").where('name rlike "^[A-Z]{20}$").select('name)

There is no regexp in Spark SQL but it's same as rlike.