Catching / blocking SIGINT during system call

2019-02-09 09:47发布

问题:

I've written a web crawler that I'd like to be able to stop via the keyboard. I don't want the program to die when I interrupt it; it needs to flush its data to disk first. I also don't want to catch KeyboardInterruptedException, because the persistent data could be in an inconsistent state.

My current solution is to define a signal handler that catches SIGINT and sets a flag; each iteration of the main loop checks this flag before processing the next url.

However, I've found that if the system happens to be executing socket.recv() when I send the interrupt, I get this:

^C
Interrupted; stopping...  // indicates my interrupt handler ran
Traceback (most recent call last):
  File "crawler_test.py", line 154, in <module>
    main()
  ...
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/socket.py", line 397, in readline
    data = recv(1)
socket.error: [Errno 4] Interrupted system call

and the process exits completely. Why does this happen? Is there a way I can prevent the interrupt from affecting the system call?

回答1:

socket.recv() calls the underlying POSIX-compliant recv function in the C layer, which, in turn, will return an error code EINTR when the process receives a SIGINT while waiting for incoming data in recv(). This error code can be used on the C side (if you were programming in C) to detect that recv() returned not because there is more data available on the socket but because the process received a SIGINT. Anyway, this error code is turned into an exception by Python, and since it is never caught, it terminates your application with the traceback you see. The solution is simply to catch socket.error, check the error code and if it is equal to errno.EINTR, ignore the exception silently. Something like this:

import errno

try:
    # do something
    result = conn.recv(bufsize)
except socket.error as (code, msg):
    if code != errno.EINTR:
        raise


回答2:

If you don't want your socket call to be interrupted disable the interrupt behavior after you set the signal handler.

signal.signal(<your signal here>, <your signal handler function here>)
signal.siginterrupt(<your signal here>, False)

In the signal handling function set some flag, e.g. a threading.Event() and then check that flag in your main processing function and terminate your crawler gracefully.

Background info here:

  • linux signal man page See discussion about SA_RESTART flag.
  • python docs