I am using Spark 1.3.0 and Spark Avro 1.0.0.
I am working from the example on the repository page. This following code works well
val df = sqlContext.read.avro("src/test/resources/episodes.avro")
df.filter("doctor > 5").write.avro("/tmp/output")
But what if I needed to see if the doctor
string contains a substring? Since we are writing our expression inside of a string. What do I do to do a "contains"?
You can use contains
(this works with an arbitrary sequence):
df.filter($"foo".contains("bar"))
like
(SQL like with SQL simple regular expression whith _
matching an arbitrary character and %
matching an arbitrary sequence):
df.filter($"foo".like("bar"))
or rlike
(like with Java regular expressions):
df.filter($"foo".rlike("bar"))
depending on your requirements. LIKE
and RLIKE
should work with SQL expressions as well.
In pyspark,SparkSql syntax:
where column_n like 'xyz%'
might not work.
Use:
where column_n RLIKE '^xyz'
This works perfectly fine.