Status of parallelization of pandas.apply() [close

2019-03-07 22:01发布

Over the last several years there have been several posts related to the parallelization of pandas.apply() or posts that describe problems that could be solved by structuring the data as a dataframe and using pandas.apply() if parallelization was implemented.

My question to the community of experts here - what is the status of this capability as R already has mclapply.

At the moment there is no clean standard solution. It is incredibly tedious to re-code entire functions and scripts to work with the proposed workarounds.

Python Pandas Multiprocessing Apply

Parallelize apply after pandas groupby

Parallel and Multicore Processing in R

Python multiprocessing pool.map for multiple arguments

Parallel Processing in python

passing kwargs with multiprocessing.pool.map

passing arguments and manager.dict to pool in multiprocessing in python 2.7

Is there a simple process-based parallel map for python?

Pandas with rpy2 and multiprocessing

How to asynchronously apply function via Spark to subsets of dataframe?

Efficiently applying a function to a grouped pandas DataFrame in parallel

python dask DataFrame, support for (trivially parallelizable) row apply?