I am brand new to pyspark
and want to translate my existing pandas
/ python
code to PySpark
.
I want to subset my dataframe
so that only rows that contain specific key words I'm looking for in 'original_problem'
field is returned.
Below is the Python code I tried in PySpark:
def pilot_discrep(input_file):
df = input_file
searchfor = ['cat', 'dog', 'frog', 'fleece']
df = df[df['original_problem'].str.contains('|'.join(searchfor))]
return df
When I try to run the above, I get the following error:
AnalysisException: u"Can't extract value from original_problem#207: need struct type but got string;"