malloc inside linux signal handler cause deadlock

2019-05-11 13:37发布

问题:

First of all sorry for calling malloc inside signal handler :).I too understand we should not do any time consuming task/this kind of nasty stuff inside signal handler.

But i am curious to know the reason why it is crashed ?

 #0  0x00006e3ff2b60dce in _lll_lock_wait_private () from /lib64/libc.so.6
 #1  0x00006e3ff2aec138 in _L_lock_9164 () from /lib64/libc.so.6
 #2  0x00006e3ff2ae9a32 in malloc () from /lib64/libc.so.6
 #3  0x00006e3ff1f691ad in ?? () from ..

i got similar core reported in https://access.redhat.com/solutions/48701 .

operating system : RHEL

回答1:

malloc() is not a function that can be safely called from a signal handler. It's not a async-signal-safe function. So, you should never call malloc() from a signal handler. You are only allowed to call a limited set of functons from a signal handler. See the man signal-safety for the list of functions you can safely call from a signal handler.

Looking at your GDB output, it appears that while malloc() is holding a lock, you are calling malloc() again which results in a deadlock.



回答2:

Only async-signal-safe functions can safely be called from within a signal handler.

Per the POSIX standard:

Any function not in the above [replicated below]table may be unsafe with respect to signals. Implementations may make other interfaces async-signal-safe. In the presence of signals, all functions defined by this volume of POSIX.1-2008 shall behave as defined when called from or interrupted by a signal-catching function, with the exception that when a signal interrupts an unsafe function or equivalent (such as the processing equivalent to exit() performed after a return from the initial call to main()) and the signal-catching function calls an unsafe function, the behavior is undefined. Additional exceptions are specified in the descriptions of individual functions such as longjmp().

If you call an "unsafe function" from within a signal handler, the "behavior is undefined".

The Linux signal.7 man page states:

Async-signal-safe functions

A signal handler function must be very careful, since processing elsewhere may be interrupted at some arbitrary point in the execution of the program. POSIX has the concept of "safe function". If a signal interrupts the execution of an unsafe function, and handler either calls an unsafe function or handler terminates via a call to longjmp() or siglongjmp() and the program subsequently calls an unsafe function, then the behavior of the program is undefined.

The Linux man page provides a list async-signal-safe functions on Linux. They may differ from those listed in the POSIX specification - I have not compared them, and standards and implementations do change over time. The "safe functions" from the POSIX "above table" in the first quote above consists of only the following functions:

_Exit()
_exit()
abort()
accept()
access()
aio_error()
aio_return()
aio_suspend()
alarm()
bind()
cfgetispeed()
cfgetospeed()
cfsetispeed()
cfsetospeed()
chdir()
chmod()
chown()
clock_gettime()
close()
connect()
creat()
dup()
dup2()
execl()
execle()
execv()
execve()
faccessat()
fchdir()
fchmod()
fchmodat()
fchown()
fchownat()
fcntl()
fdatasync()
fexecve()
ffs()
fork()
fstat()
fstatat()
fsync()
ftruncate()
futimens()
getegid()
geteuid()
getgid()
getgroups()
getpeername()
getpgrp()
getpid()
getppid()
getsockname()
getsockopt()
getuid()
htonl()
htons()
kill()
link()
linkat()
listen()
longjmp()
lseek()
lstat()
memccpy()
memchr()
memcmp()
memcpy()
memmove()
memset()
mkdir()
mkdirat()
mkfifo()
mkfifoat()
mknod()
mknodat()
ntohl()
ntohs()
open()
openat()
pause()
pipe()
poll()
posix_trace_event()
pselect()
pthread_kill()
pthread_self()
pthread_sigmask()
raise()
read()
readlink()
readlinkat()
recv()
recvfrom()
recvmsg()
rename()
renameat()
rmdir()
select()
sem_post()
send()
sendmsg()
sendto()
setgid()
setpgid()
setsid()
setsockopt()
setuid()
shutdown()
sigaction()
sigaddset()
sigdelset()
sigemptyset()
sigfillset()
sigismember()
siglongjmp()
signal()
sigpause()
sigpending()
sigprocmask()
sigqueue()
sigset()
sigsuspend()
sleep()
sockatmark()
socket()
socketpair()
stat()
stpcpy()
stpncpy()
strcat()
strchr()
strcmp()
strcpy()
strcspn()
strlen()
strncat()
strncmp()
strncpy()
strnlen()
strpbrk()
strrchr()
strspn()
strstr()
strtok_r()
symlink()
symlinkat()
tcdrain()
tcflow()
tcflush()
tcgetattr()
tcgetpgrp()
tcsendbreak()
tcsetattr()
tcsetpgrp()
time()
timer_getoverrun()
timer_gettime()
timer_settime()
times()
umask()
uname()
unlink()
unlinkat()
utime()
utimensat()
utimes()
wait()
waitpid()
wcpcpy()
wcpncpy()
wcscat()
wcschr()
wcscmp()
wcscpy()
wcscspn()
wcslen()
wcsncat()
wcsncmp()
wcsncpy()
wcsnlen()
wcspbrk()
wcsrchr()
wcsspn()
wcsstr()
wcstok()
wmemchr()
wmemcmp()
wmemcpy()
wmemmove()
wmemset()
write()


回答3:

The implementation of malloc might be grabbing internal glibc lock. We know that signal handlers are called asynchronously. If the thread during normal execution had malloc'd and was interrupted to handle signal we have a problem if signal handler function uses malloc. Signal handler malloc would try to get the lock but it isn't available because the same thread had got it during its normal execution. And you have deadlock. It is for this reason signal handlers should be lean and non-AS-safe function should not be called.



回答4:

To directly answer the OP's issue. Some glibc wrappers (e.g. malloc arenas and printf file access) use a low-level-lock for concurrency. The signal handler enters a function call, grabs the "lll_", is interrupted, re-enters the function call and deadlocks.

Possible solutions: 1) the first has already been discussed above 2) do not use glibc wrappers - go straight to the kernel syscall. E.g. don't use printf, use write. Don't use glibc malloc, use syscall(sbrk...) - probably not a good idea unless you REALLY have to... 3) don't do ANY dynamic memory allocation in the handler, allocate it in the main task and access it in the handler