-->

C# Parallel.Foreach equivalent in Python

2019-06-17 22:15发布

问题:

I have 96 txt files that have to be processed. Right now I am using a for loop and doing them one at a time, this process is very slow. The resulting 96 files, do not need to be merged. Is there a way to make them run in parallel, ala Parallel.foreach in C#? Current code:

for src_name in glob.glob(source_dir+'/*.txt'):
   outfile = open (...)
   with open(...) as infile:
      for line in infile:
         --PROCESS--
   for --condition--:
      outfile.write(...)
   infile.close()
   outfile.close()

Want this process to run in parallel for all files in source_dir.

回答1:

Assuming that the limiting factor is indeed the processing and not the I/O, you can use joblib to easily run your loop on multiple CPUs.

A simple example from their documentation:

>>> from math import sqrt
>>> from joblib import Parallel, delayed
>>> Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in range(10))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]