Filter df when values matches part of a string in

2019-02-01 23:41发布

问题:

I have a large pyspark.sql.dataframe.DataFrame and I want to keep (so filter) all rows where the URL saved in the location column contains a pre-determined string, e.g. 'google.com'.

I have tried df.filter(sf.col('location').contains('google.com') but this throws a

TypeError: _TypeError: 'Column' object is not callable'

How do I go around and filter my df properly? Many thanks in advance!

回答1:

You can use plain SQL in filter

df.filter("location like '%google.com%'")

or with DataFrame column methods

df.filter(df.location.like('%google.com%'))


回答2:

I had to do the same task as you and contains worked:

df.where(df.location.contains('google.com'))

Maybe they have corrected it.