可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm trying to build a Python daemon that launches other fully independent processes.

The general idea is for a given shell command, poll every few seconds and ensure that exactly k instances of the command are running. We keep a directory of pidfiles, and when we poll we remove pidfiles whose pids are no longer running and start up (and make pidfiles for) however many processes we need to get to k of them.

The child processes also need to be fully independent, so that if the parent process dies the children won't be killed. From what I've read, it seems there is no way to do this with the subprocess module. To this end, I used the snippet mentioned here:

http://code.activestate.com/recipes/66012-fork-a-daemon-process-on-unix/

I made a couple necessary modifications (you'll see the lines commented out in the attached snippet):

The original parent process can't exit because we need the launcher daemon to persist indefinitely.
The child processes need to start with the same cwd as the parent.

Here's my spawn fn and a test:

import os
import sys
import subprocess
import time

def spawn(cmd, child_cwd):
    """
    do the UNIX double-fork magic, see Stevens' "Advanced 
    Programming in the UNIX Environment" for details (ISBN 0201563177)
    http://www.erlenstar.demon.co.uk/unix/faq_2.html#SEC16
    """
    try: 
        pid = os.fork() 
        if pid > 0:
            # exit first parent
            #sys.exit(0) # parent daemon needs to stay alive to launch more in the future
            return
    except OSError, e: 
        sys.stderr.write("fork #1 failed: %d (%s)\n" % (e.errno, e.strerror))
        sys.exit(1)

    # decouple from parent environment
    #os.chdir("/") # we want the children processes to 
    os.setsid() 
    os.umask(0) 

    # do second fork
    try: 
        pid = os.fork() 
        if pid > 0:
            # exit from second parent
            sys.exit(0) 
    except OSError, e: 
        sys.stderr.write("fork #2 failed: %d (%s)\n" % (e.errno, e.strerror))
        sys.exit(1) 

    # redirect standard file descriptors
    sys.stdout.flush()
    sys.stderr.flush()
    si = file('/dev/null', 'r')
    so = file('/dev/null', 'a+')
    se = file('/dev/null', 'a+', 0)
    os.dup2(si.fileno(), sys.stdin.fileno())
    os.dup2(so.fileno(), sys.stdout.fileno())
    os.dup2(se.fileno(), sys.stderr.fileno())

    pid = subprocess.Popen(cmd, cwd=child_cwd, shell=True).pid

    # write pidfile       
    with open('pids/%s.pid' % pid, 'w') as f: f.write(str(pid))
    sys.exit(1)

def mkdir_if_none(path):
    if not os.access(path, os.R_OK):
        os.mkdir(path)

if __name__ == '__main__':
    try:
        cmd = sys.argv[1]
        num = int(sys.argv[2])
    except:
        print 'Usage: %s <cmd> <num procs>' % __file__
        sys.exit(1)
    mkdir_if_none('pids')
    mkdir_if_none('test_cwd')

    for i in xrange(num):
        print 'spawning %d...'%i
        spawn(cmd, 'test_cwd')
        time.sleep(0.01) # give the system some breathing room

In this situation, things seem to work fine, and the child processes persist even when the parent is killed. However, I'm still running into a spawn limit on the original parent. After ~650 spawns (not concurrently, the children have finished) the parent process chokes with the error:

spawning 650...
fork #2 failed: 35 (Resource temporarily unavailable)

Is there any way to rewrite my spawn function so that I can spawn these independent child processes indefinitely? Thanks!

回答1:

Thanks to your list of processes I'm willing to say that this is because you have hit one of a number of fundamental limitations:

rlimit nproc maximum number of processes a given user is allowed to execute -- see setrlimit(2), the bash(1) ulimit built-in, and /etc/security/limits.conf for details on per-user process limits.
rlimit nofile maximum number of file descriptors a given process is allowed to have open at once. (Each new process probably creates three new pipes in the parent, for the child's stdin, stdout, and stderr descriptors.)
System-wide maximum number of processes; see /proc/sys/kernel/pid_max.
System-wide maximum number of open files; see /proc/sys/fs/file-max.

Because you're not reaping your dead children, many of these resources are held open longer than they should. Your second children are being properly handled by init(8) -- their parent is dead, so they are re-parented to init(8), and init(8) will clean up after them (wait(2)) when they die.

However, your program is responsible for cleaning up after the first set of children. C programs typically install a signal(7) handler for SIGCHLD that calls wait(2) or waitpid(2) to reap the children's exit status and thus remove its entries from the kernel's memory.

But signal handling in a script is a bit annoying. If you can set the SIGCHLD signal disposition to SIG_IGN explicitly, the kernel will know that you are not interested in the exit status and will reap the children for you_.

Try adding:

import signal
signal.signal(signal.SIGCHLD, signal.SIG_IGN)

near the top of your program.

Note that I don't know what this does for Subprocess. It might not be pleased. If that is the case, then you'll need to install a signal handler to call wait(2) for you.

回答2:

I'm slightly modified your code and was able to run 5000 processes without any issues. So I agree with @sarnold that you hit some fundamental limitation. My modifications are:

proc = subprocess.Popen(cmd, cwd=child_cwd, shell=True, close_fds=True)    
pid = proc.pid

# write pidfile       
with open('pids/%s.pid' % pid, 'w') as f: f.write(str(pid))
proc.wait()
sys.exit(1)