How to signal an application without killing it in

I have a watchdog application. It watches my main app which might crash for one reason or another (I know it is bad, but this is not the point).

I programmed this watchdog to accept SIGUSR1 signals to stop monitoring my application presence. I signal it with

kill -SIGUSR1 `pidof myapp`

This works really well. My problem comes when I try to signal an older version of the watchdog which does not have this functionality built in. In this case, the kill signal kills the watchdog (terminates the process), which leads to further complications (rebooting of the device).

Is there a way to signal my watchdog with SIGUSR1 so that it does not terminate if this particular signal is unhandled?

回答1:

From the GNU docs about signal handling:

The SIGUSR1 and SIGUSR2 signals are set aside for you to use any way you want. They're useful for simple interprocess communication, if you write a signal handler for them in the program that receives the signal. There is an example showing the use of SIGUSR1 and SIGUSR2 in section Signaling Another Process. The default action is to terminate the process.

The default action for SIGINFO is to do nothing, so it may be more suitable:

SIGINFO: Information request. In 4.4 BSD and the GNU system, this signal is sent to all the processes in the foreground process group of the controlling terminal when the user types the STATUS character in canonical mode; see section Characters that Cause Signals. If the process is the leader of the process group, the default action is to print some status information about the system and what the process is doing. Otherwise the default is to do nothing.

SIGHUP is emitted when the controlling terminal is closed, but since most daemons are not attached to a terminal it is not uncommon to use it as "reload":

Daemon programs sometimes use SIGHUP as a signal to restart themselves, the most common reason for this being to re-read a configuration file that has been changed.

BTW, your watchdog could read a config file from time to time in order to know if it should relaunch the process.

My personal favorite for a watchdog is supervisor.

$ supervisorctl start someapp
someapp: started

$ supervisorctl status someapp
someapp                RUNNING    pid 16583, uptime 19:16:26

$ supervisorctl stop someapp
someapp: stopped

See if kill -l returns the list of signals on your platform and try some of them, but SIGUSR1 seems like a bad choice.

$ kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL       5) SIGTRAP
 6) SIGABRT      7) SIGBUS       8) SIGFPE       9) SIGKILL     10) SIGUSR1
11) SIGSEGV     12) SIGUSR2     13) SIGPIPE     14) SIGALRM     15) SIGTERM
16) SIGSTKFLT   17) SIGCHLD     18) SIGCONT     19) SIGSTOP     20) SIGTSTP
21) SIGTTIN     22) SIGTTOU     23) SIGURG      24) SIGXCPU     25) SIGXFSZ
26) SIGVTALRM   27) SIGPROF     28) SIGWINCH    29) SIGIO       30) SIGPWR
31) SIGSYS      34) SIGRTMIN    35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3
38) SIGRTMIN+4  39) SIGRTMIN+5  40) SIGRTMIN+6  41) SIGRTMIN+7  42) SIGRTMIN+8
43) SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7
58) SIGRTMAX-6  59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
63) SIGRTMAX-1  64) SIGRTMAX

[UPDATE]

Carpetsmoker comments about differences in behavior between Linux and BSDs:

SIGINFO seems to work different on GNU libc & BSD; on BSD, it works as you describe, but on Linux, it either doesn't exist, or is the same as SIGPWR... The GNU libc manual seems incorrect in this regard (your kill -l output also doesn't show SIGINFO)... I don't know why GNU doesn't support it, because I find it to be very useful... – Carpetsmoker

回答2:

The default action when receiving a SIGUSR1 is to terminate if the handler is not present. Meaning you can't do what you want with that signal anymore.

Short of updating the watchdog, there is nothing you can do (and I'm assuming that you are unable to differentiate watchdog versions from within the program prior to sending the signal).