I understand using subprocess is the preferred way of calling external command.
But what if I want to run several commands in parall, but limit the number of processes being spawned? What bothers me is that I can't block the subprocesses. For example, if I call
subprocess.Popen(cmd, stderr=outputfile, stdout=outputfile)
Then the process will continue, without waiting for cmd
to finish. Therefore, I can't wrap it up in a worker of multiprocessing
library.
For example, if I do:
def worker(cmd):
subprocess.Popen(cmd, stderr=outputfile, stdout=outputfile);
pool = Pool( processes = 10 );
results =[pool.apply_async(worker, [cmd]) for cmd in cmd_list];
ans = [res.get() for res in results];
then each worker will finish and return after spawning a subprocess. So I can't really limit the number of processes generated by subprocess
by using Pool
.
What's the proper way of limiting the number of subprocesses?
You can use
subprocess.call
if you want to wait for the command to complete. Seepydoc subprocess
for more information.You could also call the
Popen.wait
method in your worker:You don't need multiple Python processes or even threads to limit maximum number of parallel subprocesses:
See Iterate an iterator by chunks (of n) in Python?
If you'd like to limit both maximum and minimum number of parallel subprocesses, you could use a thread pool:
As soon as any of
limit
subprocesses ends, a new subprocess is started to maintainlimit
number of subprocesses at all times.Or using
ThreadPoolExecutor
:Here's a simple thread pool implementation:
To avoid premature exit, add exception handling.
If you want to capture subprocess' output in a string, see Python: execute cat subprocess in parallel.