I was using this answer in order to run parallel commands with multiprocessing in Python on a Linux box.
My code did something like:
import multiprocessing
import logging
def cycle(offset):
# Do stuff
def run():
for nprocess in process_per_cycle:
logger.info("Start cycle with %d processes", nprocess)
offsets = list(range(nprocess))
pool = multiprocessing.Pool(nprocess)
pool.map(cycle, offsets)
But I was getting this error: OSError: [Errno 24] Too many open files
So, the code was opening too many file descriptor, i.e.: it was starting too many processes and not terminating them.
I fixed it replacing the last two lines with these lines:
with multiprocessing.Pool(nprocess) as pool:
pool.map(cycle, offsets)
But I do not know exactly why those lines fixed it.
What is happening underneath of that with
?
You're creating new processes inside a loop, and then forgetting to close them once you're done with them. As a result, there comes a point where you have too many open processes. This is a bad idea.
You could fix this by using a context manager which automatically calls
pool.terminate
, or manually callpool.terminate
yourself. Alternatively, why don't you create a pool outside the loop just once, and then send tasks to the processes inside?For more information, you could peruse the
multiprocessing.Pool
documentation.It is context manger. Using with ensures that you are opening and closing files properly. To understand this in detail, I'd recommend this article https://jeffknupp.com/blog/2016/03/07/python-with-context-managers/