Filter df when values matches part of a string in

2019-02-01 23:41发布


I have a large pyspark.sql.dataframe.DataFrame and I want to keep (so filter) all rows where the URL saved in the location column contains a pre-determined string, e.g. ''.

I have tried df.filter(sf.col('location').contains('') but this throws a

TypeError: _TypeError: 'Column' object is not callable'

How do I go around and filter my df properly? Many thanks in advance!


You can use plain SQL in filter

df.filter("location like ''")

or with DataFrame column methods



I had to do the same task as you and contains worked:


Maybe they have corrected it.