When i write a following pyspark command:
# comment 1
df = df.withColumn('explosion', explode(col('col1'))).filter(col('explosion')['sub_col1'] == 'some_string') \
# comment 2
.withColumn('sub_col2', from_unixtime(col('explosion')['sub_col2'])) \
# comment 3
.withColumn('sub_col3', from_unixtime(col('explosion')['sub_col3']))
I get the following error:
.withColumn('sub_col2', from_unixtime(col('explosion')['sub_col2']))
^
IndentationError: unexpected indent
Is there a way to write comments in between the lines of multiple-line commands in pyspark?
This is not a
pyspark
issue, but rather a violation of python syntax.Consider the following example:
This results in:
The
\
is a continuation character and python interprets anything on the next line as occurring immediately after, causing your error.One way around this is to use parentheses instead:
When assigning to a variable this would look like
Or in your case: