First of all sorry for calling malloc inside signal handler :).I too understand we should not do any time consuming task/this kind of nasty stuff inside signal handler.
But i am curious to know the reason why it is crashed ?
#0 0x00006e3ff2b60dce in _lll_lock_wait_private () from /lib64/libc.so.6
#1 0x00006e3ff2aec138 in _L_lock_9164 () from /lib64/libc.so.6
#2 0x00006e3ff2ae9a32 in malloc () from /lib64/libc.so.6
#3 0x00006e3ff1f691ad in ?? () from ..
i got similar core reported in https://access.redhat.com/solutions/48701 .
operating system : RHEL
malloc()
is not a function that can be safely called from a signal handler. It's not a async-signal-safe function.
So, you should never call malloc() from a signal handler. You are only allowed to call a limited set of functons from a signal handler.
See the man signal-safety for the list of functions you can safely call from a signal handler.
Looking at your GDB output, it appears that while malloc()
is holding a lock, you are calling malloc()
again which results in a deadlock.
Only async-signal-safe functions can safely be called from within a signal handler.
Per the POSIX standard:
Any function not in the above [replicated below]table may be unsafe with respect to
signals. Implementations may make other interfaces async-signal-safe.
In the presence of signals, all functions defined by this volume of
POSIX.1-2008 shall behave as defined when called from or interrupted
by a signal-catching function, with the exception that when a signal
interrupts an unsafe function or equivalent (such as the processing
equivalent to exit()
performed after a return from the initial call
to main()) and the signal-catching function calls an unsafe function,
the behavior is undefined. Additional exceptions are specified in the
descriptions of individual functions such as longjmp()
.
If you call an "unsafe function" from within a signal handler, the "behavior is undefined".
The Linux signal.7
man page states:
Async-signal-safe functions
A signal handler function must be very careful, since processing
elsewhere may be interrupted at some arbitrary point in the execution
of the program. POSIX has the concept of "safe function". If a
signal interrupts the execution of an unsafe function, and handler
either calls an unsafe function or handler terminates via a call to
longjmp() or siglongjmp() and the program subsequently calls an
unsafe function, then the behavior of the program is undefined.
The Linux man page provides a list async-signal-safe functions on Linux. They may differ from those listed in the POSIX specification - I have not compared them, and standards and implementations do change over time.
The "safe functions" from the POSIX "above table" in the first quote above consists of only the following functions:
_Exit()
_exit()
abort()
accept()
access()
aio_error()
aio_return()
aio_suspend()
alarm()
bind()
cfgetispeed()
cfgetospeed()
cfsetispeed()
cfsetospeed()
chdir()
chmod()
chown()
clock_gettime()
close()
connect()
creat()
dup()
dup2()
execl()
execle()
execv()
execve()
faccessat()
fchdir()
fchmod()
fchmodat()
fchown()
fchownat()
fcntl()
fdatasync()
fexecve()
ffs()
fork()
fstat()
fstatat()
fsync()
ftruncate()
futimens()
getegid()
geteuid()
getgid()
getgroups()
getpeername()
getpgrp()
getpid()
getppid()
getsockname()
getsockopt()
getuid()
htonl()
htons()
kill()
link()
linkat()
listen()
longjmp()
lseek()
lstat()
memccpy()
memchr()
memcmp()
memcpy()
memmove()
memset()
mkdir()
mkdirat()
mkfifo()
mkfifoat()
mknod()
mknodat()
ntohl()
ntohs()
open()
openat()
pause()
pipe()
poll()
posix_trace_event()
pselect()
pthread_kill()
pthread_self()
pthread_sigmask()
raise()
read()
readlink()
readlinkat()
recv()
recvfrom()
recvmsg()
rename()
renameat()
rmdir()
select()
sem_post()
send()
sendmsg()
sendto()
setgid()
setpgid()
setsid()
setsockopt()
setuid()
shutdown()
sigaction()
sigaddset()
sigdelset()
sigemptyset()
sigfillset()
sigismember()
siglongjmp()
signal()
sigpause()
sigpending()
sigprocmask()
sigqueue()
sigset()
sigsuspend()
sleep()
sockatmark()
socket()
socketpair()
stat()
stpcpy()
stpncpy()
strcat()
strchr()
strcmp()
strcpy()
strcspn()
strlen()
strncat()
strncmp()
strncpy()
strnlen()
strpbrk()
strrchr()
strspn()
strstr()
strtok_r()
symlink()
symlinkat()
tcdrain()
tcflow()
tcflush()
tcgetattr()
tcgetpgrp()
tcsendbreak()
tcsetattr()
tcsetpgrp()
time()
timer_getoverrun()
timer_gettime()
timer_settime()
times()
umask()
uname()
unlink()
unlinkat()
utime()
utimensat()
utimes()
wait()
waitpid()
wcpcpy()
wcpncpy()
wcscat()
wcschr()
wcscmp()
wcscpy()
wcscspn()
wcslen()
wcsncat()
wcsncmp()
wcsncpy()
wcsnlen()
wcspbrk()
wcsrchr()
wcsspn()
wcsstr()
wcstok()
wmemchr()
wmemcmp()
wmemcpy()
wmemmove()
wmemset()
write()
The implementation of malloc
might be grabbing internal glibc lock. We know that signal handlers are called asynchronously. If the thread during normal execution had malloc
'd and was interrupted to handle signal we have a problem if signal handler function uses malloc
. Signal handler malloc
would try to get the lock but it isn't available because the same thread had got it during its normal execution. And you have deadlock. It is for this reason signal handlers should be lean and non-AS-safe function should not be called.
To directly answer the OP's issue. Some glibc wrappers (e.g. malloc arenas and printf file access) use a low-level-lock for concurrency. The signal handler enters a function call, grabs the "lll_", is interrupted, re-enters the function call and deadlocks.
Possible solutions:
1) the first has already been discussed above
2) do not use glibc wrappers - go straight to the kernel syscall. E.g. don't use printf, use write. Don't use glibc malloc, use syscall(sbrk...) - probably not a good idea unless you REALLY have to...
3) don't do ANY dynamic memory allocation in the handler, allocate it in the main task and access it in the handler