multiprocessing gives AssertionError: daemonic pro

2020-03-26 09:20发布

问题:

I am trying to use multiprocessing for the first time. So I thought I would make a very simple test example which factors 100 different numbers.

from multiprocessing import Pool
from primefac import factorint
N = 10**30
L = range(N,N + 100)
pool = Pool()
pool.map(factorint, L)

This gives me the error:

Traceback (most recent call last):
  File "test.py", line 8, in <module>
    pool.map(factorint, L)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
    raise self._value
AssertionError: daemonic processes are not allowed to have children

I see that Python Process Pool non-daemonic? discusses this problem but I don't understand why it is relevant to my simple toy example. What am I doing wrong?

回答1:

The problem appears to be that primefac uses its own multiprocessing.Pool. Unfortunately, while PyPI is down, I can't find the source to the module—but I did find various forks on GitHub, like this one, and they all have multiprocessing code.

So, your apparently simple example isn't all that simple—because it's importing and running non-simple code.

By default, all Pool processes are daemonic, so you can't create more child processes from inside another Pool. Usually, attempting to do so is a mistake.

If you really do want to multiprocess the factors even though some of them are going to multiprocess their own work (quite possibly adding more contention overhead without adding any parallelism), then you just have to subclass Pool and override that—as explained in the related question that you linked.

But the simplest thing is to just not use multiprocessing here, if primefac is already using your cores efficiently. (If you need quasi-concurrency, getting answers as they come in instead of getting them in sequence, I suppose you could do that with a thread pool, but I don't think there's any advantage to that here—you're not using imap_unordered or explicit AsyncResult anywhere.)

Alternatively, if it's not using all of your cores most of the time, only doing so for the "tricky remainders" at the end of factoring some numbers, while you've got 7 cores sitting idle for 60% of the time… then you probably want to prevent primefac from using multiprocessing at all. I don't know if the module has a public API for doing that. If so, of course, just use it. If not… well, you may have to subclass or monkeypatch some of its code, or, at worst, monkeypatching its import of multiprocessing, and that may not be worth doing.

The ideal solution would probably be to refactor primefac to push the "tricky remainder" jobs onto the same pool you're already using. But that's probably by far the most work, and not that much more benefit.


As a side note, this isn't your problem, but you should have a __main__ guard around your top-level code, like this:

from multiprocessing import Pool
from primefac import factorint

if __name__ == '__main__':
    N = 10**30
    L = range(N,N + 100)
    pool = Pool()
    pool.map(factorint, L)

Otherwise, when run with the spawn or forkserver startmethods—and notice that spawn is the only one available on Windows—each pool process is going to try to create another pool of children. So, if you run your code on Windows, you would get this same assertion—as a way for multiprocessing to protect you from accidentally forkbombing your system.

This is explained under safe importing of main module in the "programming guidelines" section of the multiprocessing docs.