Python multiprocessing issue / misunderstanding

2020-06-26 07:57发布

问题:

I'm having an issue with multiprocessing; I'm using python2.7 on linux 2.6.36. I know this would be much easier with a higher level module or library but I'm trying to use the lower level functions (os.fork() and os.exec*) to ensure I really understand- This is kind of a learning exercise.

Below is my code, it's a multiprocessing 'ping' utility. The issue is while it seems to run, it throws an OSError every so often on the os.wait() line. "No child processes"

That doesn't make sense to me as os.wait() should only be called when the program catches a signal that a child process has exited and needs to be reaped.

Following the code is sample output.

What am I doing wrong?

#!/usr/bin/python2.7

import os
import time
import sys
import signal

kids = []

def chldClean(SIG, FRM):
    global kids
    pid, status = os.wait()
    kids.pop(kids.index(pid))

signal.signal(signal.SIGCHLD, chldClean)

hosts = ( '10.98.232.66',
          '10.86.144.241',
          '10.86.144.242',
          '10.98.193.130',
          '10.98.198.130',
          '10.86.116.161',
          '10.86.120.161',
          '10.99.21.254',
          '10.97.98.102',
          '10.97.111.180' )


while True:
    for HOST in hosts:
        while len(kids) > 7:
            time.sleep(0.1)
        pid = os.fork()
        if pid == 0:
            os.closerange(0, 4)
            os.execl('/bin/ping', 'PING', '-c', '1', '-w', '2', HOST)
        else:
            kids.append(pid)
            print kids

Sample Output with Errors below ----------------------------------------

[18188]
[18188, 18189]
[18188, 18189, 18190]
[18188, 18189, 18190, 18191]
[18188, 18189, 18190, 18191, 18192]
[18188, 18189, 18190, 18191, 18192, 18193]
[18188, 18189, 18190, 18191, 18192, 18193, 18194]
[18188, 18189, 18190, 18191, 18192, 18193, 18194, 18195]
[18188, 18189, 18191, 18192, 18193, 18194, 18195, 18196]
[18188, 18191, 18192, 18193, 18194, 18195, 18196, 18197]
[18188, 18191, 18192, 18194, 18195, 18196, 18197, 18198]
[18191, 18192, 18194, 18195, 18196, 18197, 18198, 18201]
[18191, 18194, 18195, 18196, 18197, 18198, 18201, 18202]
[18191, 18195, 18196, 18197, 18198, 18201, 18202, 18203]
[18195, 18196, 18197, 18198, 18201, 18202, 18203, 18204]
[18196, 18197, 18198, 18202, 18203, 18204, 18205]
[18196, 18197, 18198, 18202, 18203, 18204, 18205, 18206]
[18197, 18198, 18202, 18203, 18204, 18205, 18206, 18207]
[18198, 18203, 18204, 18205, 18206, 18207, 18210]
[18198, 18203, 18204, 18205, 18206, 18207, 18210, 18211]
Traceback (most recent call last):
  File "./sunmon-mp", line 33, in <module>
    pid = os.fork()
  File "./sunmon-mp", line 12, in chldClean
    pid, status = os.wait()
OSError: [Errno 10] No child processes
[18203, 18204, 18205, 18206, 18207, 18210, 18211, 18212]
[18203, 18204, 18206, 18207, 18210, 18211, 18212, 18213]
[18203, 18204, 18206, 18207, 18211, 18212, 18213, 18214]
[18203, 18204, 18206, 18207, 18211, 18212, 18214, 18215]
[18203, 18204, 18206, 18207, 18212, 18214, 18215, 18217]
[18203, 18204, 18206, 18207, 18214, 18215, 18217, 18218]
[18203, 18204, 18206, 18207, 18215, 18217, 18218, 18219]
[18204, 18206, 18207, 18215, 18217, 18218, 18219, 18220]
[18204, 18206, 18207, 18217, 18218, 18219, 18220, 18221]
[18206, 18207, 18217, 18218, 18219, 18220, 18221, 18223]
[18207, 18217, 18218, 18219, 18220, 18221, 18223, 18224]
[18217, 18218, 18219, 18220, 18221, 18223, 18224, 18225]
[18217, 18219, 18220, 18221, 18223, 18224, 18225, 18226]
[18217, 18219, 18220, 18221, 18223, 18225, 18226, 18227]
[18217, 18219, 18220, 18221, 18223, 18226, 18227, 18228]
[18217, 18220, 18221, 18223, 18226, 18227, 18228, 18229]
[18217, 18220, 18221, 18223, 18227, 18228, 18229, 18230]
[18217, 18220, 18221, 18223, 18227, 18228, 18230, 18231]
[18220, 18221, 18223, 18227, 18228, 18230, 18231, 18233]
[18221, 18223, 18227, 18228, 18230, 18231, 18233, 18234]
[18223, 18227, 18228, 18230, 18231, 18233, 18234, 18235]
[18223, 18227, 18228, 18231, 18233, 18234, 18235, 18236]
[18223, 18227, 18228, 18231, 18233, 18234, 18236, 18237]
[18223, 18227, 18228, 18231, 18233, 18234, 18237, 18239]
[18227, 18228, 18231, 18233, 18234, 18237, 18239, 18240]
[18228, 18231, 18233, 18234, 18237, 18239, 18240, 18241]
[18228, 18231, 18233, 18237, 18239, 18240, 18241, 18242]
[18231, 18233, 18237, 18239, 18240, 18241, 18242, 18243]
[18231, 18233, 18239, 18240, 18241, 18242, 18243, 18244]
[18231, 18233, 18239, 18240, 18242, 18243, 18244, 18245]
[18231, 18233, 18239, 18242, 18243, 18244, 18245, 18246]
[18231, 18233, 18242, 18243, 18244, 18245, 18246, 18247]
[18233, 18242, 18243, 18244, 18245, 18246, 18247, 18248]
[18242, 18243, 18244, 18245, 18246, 18247, 18248, 18249]
[18243, 18244, 18245, 18246, 18247, 18248, 18249, 18250]
[18243, 18245, 18246, 18247, 18248, 18249, 18250, 18251]
[18243, 18245, 18247, 18248, 18249, 18250, 18251, 18252]
[18243, 18245, 18248, 18249, 18250, 18251, 18252, 18253]
[18243, 18245, 18249, 18250, 18251, 18252, 18253, 18254]
[18243, 18245, 18249, 18250, 18252, 18253, 18254, 18255]
[18245, 18249, 18250, 18252, 18253, 18254, 18255, 18258]
[18249, 18250, 18252, 18253, 18254, 18255, 18258, 18259]
[18249, 18250, 18253, 18254, 18255, 18258, 18259, 18260]
[18249, 18250, 18253, 18254, 18255, 18258, 18260, 18261]
[18249, 18250, 18253, 18254, 18255, 18260, 18261, 18262]
[18250, 18253, 18254, 18255, 18260, 18261, 18262, 18263]
[18253, 18254, 18255, 18260, 18261, 18262, 18263, 18264]
[18253, 18254, 18255, 18261, 18262, 18263, 18264, 18265]
[18253, 18254, 18255, 18261, 18262, 18264, 18265, 18266]
[18254, 18255, 18261, 18262, 18264, 18265, 18266, 18267]
[18255, 18261, 18262, 18264, 18265, 18266, 18267, 18268]
[18261, 18262, 18264, 18265, 18266, 18267, 18268, 18269]
[18261, 18262, 18265, 18266, 18267, 18268, 18269, 18270]
[18261, 18262, 18265, 18266, 18267, 18268, 18270, 18271]
[18261, 18262, 18265, 18266, 18267, 18270, 18271, 18273]
[18261, 18262, 18265, 18266, 18270, 18271, 18273, 18274]
[18261, 18262, 18265, 18266, 18271, 18273, 18274, 18275]
[18261, 18262, 18265, 18266, 18271, 18273, 18275, 18276]
[18262, 18265, 18266, 18271, 18273, 18275, 18276, 18277]
[18262, 18265, 18266, 18273, 18275, 18276, 18277, 18278]
[18265, 18266, 18273, 18276, 18277, 18278, 18280]
[18265, 18266, 18273, 18276, 18277, 18278, 18280, 18281]
Traceback (most recent call last):
  File "./sunmon-mp", line 33, in <module>
    pid = os.fork()
  File "./sunmon-mp", line 12, in chldClean
    pid, status = os.wait()
OSError: [Errno 10] No child processes
[18265, 18273, 18276, 18277, 18278, 18280, 18282]
[18265, 18276, 18277, 18278, 18280, 18281, 18282, 18283]
[18265, 18276, 18278, 18281, 18282, 18283, 18284]
[18265, 18276, 18278, 18281, 18282, 18283, 18284, 18285]
[18265, 18276, 18278, 18282, 18283, 18284, 18285, 18286]
[18265, 18276, 18278, 18283, 18284, 18286, 18289]
[18265, 18276, 18278, 18283, 18284, 18286, 18289, 18290]
Traceback (most recent call last):
  File "./sunmon-mp", line 33, in <module>
    pid = os.fork()
  File "./sunmon-mp", line 12, in chldClean
    pid, status = os.wait()
OSError: [Errno 10] No child processes
[18265, 18276, 18278, 18283, 18284, 18289, 18290, 18291]
[18276, 18278, 18283, 18284, 18289, 18290, 18291, 18292]
[18276, 18278, 18283, 18284, 18290, 18291, 18292, 18293]
[18276, 18278, 18283, 18284, 18290, 18291, 18293, 18294]
[18276, 18278, 18283, 18284, 18290, 18291, 18294, 18295]
[18278, 18283, 18284, 18290, 18291, 18294, 18295, 18297]
[18283, 18284, 18290, 18291, 18294, 18295, 18297, 18298]
[18283, 18284, 18290, 18291, 18295, 18297, 18298, 18299]

回答1:

Perhaps you are being affected by a bug causing child processes to inherit pending signals. That would explain why the stack trace appears more than once. The child is trying to wait on its own non-existent child.

Also more than one queued signal of the same type may be handled at once, so I don't recommend using wait() in the signal handler.



回答2:

You might have the same race condition problem that is described in this SO question. Unfortunately, I can't test your code right now (Windows environment, so no SIGCHLD) but it seems that if you use os.waitpid(-1, os.WNOHANG) on the problematic line 12, you wouldn't get the error. You still get no guarantee that you won't run into the race condition described above, though.