Python multiprocessing: no performance gain with m

2019-08-05 17:35发布


This question already has an answer here:

  • How can I improve CPU utilization when using the multiprocessing module? 1 answer

Using multiprocessing, I tried to parallelize a function but I have no performance improvement:

from MMTK import *
from MMTK.Trajectory import Trajectory, TrajectoryOutput, SnapshotGenerator
from MMTK.Proteins import Protein, PeptideChain
import numpy as np

filename = ''

trajectory = Trajectory(None, filename)

def calpha_2dmap_mult(trajectory = trajectory, t = range(0,len(trajectory))):
    dist = []
    universe = trajectory.universe
    proteins = universe.objectList(Protein)
    chain = proteins[0][0]
    traj = trajectory[t]
    dt = 1000 # calculate distance every 1000 steps
    for n, step in enumerate(traj):
        if n % dt == 0:
            for i in np.arange(len(chain)-1):
                for j in np.arange(len(chain)-1):

c0 = time.time()
dist1 = calpha_2dmap_mult(trajectory, range(0,11001))
c1 = time.time() - c0

# Multiprocessing
from multiprocessing import Pool, cpu_count

pool = Pool(processes=4)
c0 = time.time()
dist_pool = [pool.apply(calpha_2dmap_mult, args=(trajectory, t,)) for t in
             [range(0,2001), range(3000,5001), range(6000,8001),
c1 = time.time() - c0

The time spent to calculate the distances is the 'same' without (70.1s) or with multiprocessing (70.2s)! I was maybe not expecting an improvement of a factor 4 but I was at least expecting some improvements! Is someone knows what I did wrong?


Pool.apply is a blocking operation:

[Pool.apply is the] equivalent of the apply() built-in function. It blocks until the result is ready, so apply_async() is better suited for performing work in parallel ..

In this case is likely more appropriate for collecting the results; the map itself blocks but the sequence elements / transformations are processed in parallel.

It addition to using partial application (or manual realization of such), also consider expanding the data itself. It's the same cat in a different skin.

data = ((trajectory, r) for r in [range(0,2001), ..])
result =, data)

This can in turn be expanded:

def apply_data(d):
    return calpha_2dmap_mult(*d)

result =, data)

The function (or simple argument-expanded proxy of such of such) will need to be written to accept a single argument but all the data is now mapped as a single unit.