Python multiprocessing Pool strange behavior in Wi

2019-09-08 11:19发布

问题:

Python multiprocessing Pool have different behavior between Linux and Windows.

When running method map by number of workers, in Linux it's ran the process on the scope of the specific function you gave as parameter, But in Windows, every worker run at the scope of the parent process and use again code that it's not should be.

For example: (The flask it's only for making it similar to my code)

from multiprocessing import Pool, Event
from flask import Flask

print(">>> This code running for every each worker")

app = Flask(__name__)

terminating = None


def f(**kwargs):
    print("f()")
    x = kwargs.pop("x", 1)
    print(x * x)
    return x * x


def worker_warpper(arg):
    func, kwargs = arg
    return func(**kwargs)


def initializer(terminating_):
    global terminating
    terminating = terminating_


@app.route('/check', methods=['GET'])
def check():
    with Pool(processes=3) as pool:
        ls = [(f, {"x": 2}), (f, {"x": 5}), (f, {"x": 6})]
        pool_map = pool.map(worker_warpper, ls)
    return "Finished"


if __name__ == "__main__":
    print("Listening...")
    app.run(port=5151, host='0.0.0.0')

This chunk of code should be run the function "f" (only function "f") 3 times at 3 different process in parallel.

But it runs the print at the top again. (it's not exactly for every process again - but there is relation between the number of times to run "f" and the number of the print at the top to run again)

print(">>> This code running for every each worker")

Only in Windows, in Linux only "f" running again.

Output: (Linux)

>>> This code running for new worker (not all of the workers)
Listening
...
 * Running on http://0.0.0.0:5151/ (Press CTRL+C to quit)
f()
4
f()
25
f()
36
127.0.0.1 - - 

[29/Jan/2017 11:46:26] "GET /check HTTP/1.1" 200 -

Output: (Windows)

>>> This code running for new worker (not all of the workers)
Listening
...
 * Running on http://0.0.0.0:5151/ (Press CTRL+C to quit)
>>> This code running for new worker (not all of the workers)
f()
4
f()
25
f()
36
127.0.0.1 - - 

[29/Jan/2017 11:46:26] "GET /check HTTP/1.1" 200 -

Why there is different behavior between linux and windows? And what I can do about it?

If it's not clear tell me and i will try in a different way.

Thanks!

回答1:

The difference between Windows and Linux is the way a child process is started. On Linux, child processes are started using fork(): The new process starts in the same state as the parent process: the python code is already interpreted and it gets a copy of the memory of the parent process.

On Windows it's entirely different: Processes are spawned: a new python interpreter is started, which again parses the python file and executes it. That's why your print process at the top is executed again.

For details see the docs about fork vs. spawn.

A common pitfall is to avoid the if __name__ == '__main__' at the bottom. But since you already have this in your code, you're pretty close to "safe code".

What can I do about it?

You can use Threading instead of Multiprocessing. When you start a new Thread, the new thread uses the same memory space as the parent thread. Downside is that you can only utilize one CPU core because of pythons "global interpreter lock".

For details see this discussion