I am writing a shell script which performs a task periodically and on receiving a USR1 signal from another process.
The structure of the script is similar to this answer:
#!/bin/bash
trap 'echo "doing some work"' SIGUSR1
while :
do
sleep 10 && echo "doing some work" &
wait $!
done
However, this script has the problem that the sleep process continues in the background and only dies on its timeout. (note that when USR1 is received during wait $!, the sleep process lingers for its regular timeout, but the periodic echo indeed gets cancelled.) You can for example see the number of sleep processes on your machine using pkill -0 -c sleep
.
I read this page, which suggests killing the lingering sleep in the trap action, e.g.
#!/bin/bash
pid=
trap '[[ $pid ]] && kill $pid; echo "doing some work"' SIGUSR1
while :
do
sleep 10 && echo "doing some work" &
pid=$!
wait $pid
pid=
done
However this script has a race condition if we spam our USR1 signal fast e.g. with:
pkill -USR1 trap-test.sh; pkill -USR1 trap-test.sh
then it will try to kill a PID which was already killed and print an error. Not to mention, I do not like this code.
Is there a better way to reliably kill the forked process when interrupted? Or an alternative structure to achieve the same functionality?
Neither of your scripts terminates
sleep
, and you're making it more complicated by sending USR1 usingpkill
. As the background job is a fork of the foreground one, they share the same name (trap-test.sh
); sopkill
matches and signals both. This, in an uncertain order, kills the background process (leavingsleep
alive, explained below) and triggers the trap in the foreground one, hence the race condition.Besides, in the examples you linked, the background job is always a mere
sleep x
, but in your script it issleep 10 && echo 'doing some work'
; which requires the forked subshell to waitsleep
to terminate and conditionally executeecho
. Compare these two:So let's start from scratch and reproduce the main issue in a terminal.
Just in case, I disabled job control to partly emulate a non-interactive shell's behavior.
Killing the background job didn't kill
sleep
, I needed to terminate it manually. This happened because a signal sent to a process is not automatically broadcasted to its target's children; i.e.sleep
didn't receive the TERM signal at all.To kill
sleep
as well as the subshell, I need to put the background job into a separate process group —which requires job control to be enabled, otherwise all jobs are put into the main shell's process group as seen inpstree
's output above—, and send the TERM signal to it, as shown below.With some refinement and adaptation of this concept, your script looks like:
This will print
my PID is xxx
(wherexxx
is the PID of foreground process) and start looping. Sending a USR1 signal toxxx
(i.ekill -USR1 xxx
) will trigger the trap and cause the background process and its children to terminate. Thuswait
will return and the loop will continue.If you use
pkill
instead it'll work anyway, as the background process ignores USR1.For further information, see:
$$
and$!
),kill
specification (-$!
usage),wait
specification.You might want to use a function that kills the whole process tree including children, tries to kill it nicely, and kills it by force if niceness isn't working. Here's the part you can add to your script.
TrapQuit is called on SIGUSR1 or other exit signals received (including CTRL+C). You can add whatever handling is needed in TrapQuit, or call it on a normal script exit with an exit code.