Over the last several years there have been several posts related to the parallelization
of pandas.apply()
or posts that describe problems that could be solved by structuring the data as a dataframe and using pandas.apply()
if parallelization
was implemented.
My question to the community of experts here - what is the status of this capability as R
already has mclapply
.
At the moment there is no clean standard solution. It is incredibly tedious to re-code entire functions and scripts to work with the proposed workarounds.
Python Pandas Multiprocessing Apply
Parallelize apply after pandas groupby
Parallel and Multicore Processing in R
Python multiprocessing pool.map for multiple arguments
passing kwargs with multiprocessing.pool.map
passing arguments and manager.dict to pool in multiprocessing in python 2.7
Is there a simple process-based parallel map for python?
Pandas with rpy2 and multiprocessing
How to asynchronously apply function via Spark to subsets of dataframe?
Efficiently applying a function to a grouped pandas DataFrame in parallel
python dask DataFrame, support for (trivially parallelizable) row apply?
Python multiprocessing job to Celery task but AttributeError
Parallelizing apply function in pandas python. worked on groupby