可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I found PySpark has a method called drop but it seems it can only drop one column at a time. Any ideas about how to drop multiple columns at the same time?

df.drop(['col1','col2'])

TypeError                                 Traceback (most recent call last)
<ipython-input-96-653b0465e457> in <module>()
----> 1 selectedMachineView = machineView.drop([['GpuName','GPU1_TwoPartHwID']])

/usr/hdp/current/spark-client/python/pyspark/sql/dataframe.pyc in drop(self, col)
   1257             jdf = self._jdf.drop(col._jc)
   1258         else:
-> 1259             raise TypeError("col should be a string or a Column")
   1260         return DataFrame(jdf, self.sql_ctx)
   1261 

TypeError: col should be a string or a Column

回答1:

Simply with select:

df.select([c for c in df.columns if c not in {'GpuName','GPU1_TwoPartHwID'}])

or if you really want to use drop then reduce should do the trick:

from functools import reduce
from pyspark.sql import DataFrame

reduce(DataFrame.drop, ['GpuName','GPU1_TwoPartHwID'], df)

Note:

(difference in execution time):

There should be no difference when it comes to data processing time. While these methods generate different logical plans physical plans are exactly the same.

There is a difference however when we analyze driver-side code:

the first method makes only a single JVM call while the second one has to call JVM for each column that has to be excluded
the first method generates logical plan which is equivalent to physical plan. In the second case it is rewritten.
finally comprehensions are significantly faster in Python than methods like map or reduce
Spark 2.x+ supports multiple columns in drop. See SPARK-11884 (Drop multiple columns in the DataFrame API) and SPARK-12204 (Implement drop method for DataFrame in SparkR) for detials.