Using subprocess module to work in parallel (Multi

2019-08-05 13:10发布

问题:

New to multiprocessing in python, consider that you have the following function:

def do_something_parallel(self):
    result_operation1 = doit.main(A,B)    
    do_something_else(C)

Now the point is that I want the doit.main to run in another process and to be non blocking, so the code in do_something_else will run immediately after the first has been launched in another process.

  1. How can I do it using python subprocess module?
  2. Is there a difference between subprocessing and creating new process aside to another one, why would we need a child processes of other process?

Note: I do not want to use multithreaded approach here..

EDIT: I wondered whether using a subprocess module and multiprocess module in the same function is prohibited?
Reason I want this is that I have two things to run: first an exe file, and second a function, each needs it own process.

回答1:

If you want to run a Python code in a separate process, you could use multiprocessing module:

import multiprocessing

if __name__ == "__main__":
    multiprocessing.Process(target=doit.main, args=[A, B]).start()
    do_something_else() # this runs immmediately without waiting for main() to return

I wondered whether using a subprocess module and multiprocess module in the same function is prohibited?

No. You can use both subprocess and multiprocessing in the same function (moreover, multiprocessing may use subprocess to start its worker processes internally).

Reason I want this is that I have two things to run: first an exe file, and second a function, each needs it own process.

You don't need multprocessing to run an external command without blocking (obviously, in its own process); subprocess.Popen() is enough:

import subprocess

p = subprocess.Popen(['command', 'arg 1', 'arg 2'])
do_something_else() # this runs immediately without waiting for command to exit
p.wait() # this waits for the command to finish


回答2:

Subprocess.Popen is definitely what you want if the "worker" process is an executable. Threading is what you need when you need things to happen asynchronously, and multiprocessing is what you need if you want to take advantage of multiple cores for the improved performance (although you will likely find yourself also using threads at the same time as they handle asynchronous output of multiple parallel processes).

The main limitation of multiprocessing is passing information. When a new process is spawned, an entire separate instance of the python interpreter is started with it's own independent memory allocation. The result of this is variables changed by one process won't be changed for other processes. For this functionality you need shared memory objects (also provided by multiprocessing module). One implementation I have done was a parent process that started several worker processes and passed them both an input queue, and an output queue. The function given to the child processes was a loop designed to do some calculations on the inputs pulled from the input queue and then spit them out to the output queue. I then designated a special input that the child would recognize to end the loop and terminate the process.

On your edit - Popen will start the other process in parallel, as will multiprocessing. If you need the child process to communicate with the executable, be sure to pass the file stream handles to the child process somehow.