Why do I get a zombie when I link assembly code wi

2019-07-22 20:22发布

问题:

I was experimenting with assembly code and the GTK+ 3 libraries when I discovered that my application turns into a zombie if I don't link the object file with gcc against the standard library. Here is my code for the stdlib-free application

%include "gtk.inc"
%include "glib.inc"

global _start

SECTION .data    
destroy         db "destroy", 0     ; const gchar*
strWindow       db "Window", 0              ; const gchar*

SECTION .bss    
window         resq 1 ; GtkWindow *

SECTION .text    
_start:
    ; gtk_init (&argc, &argv);
    xor     rdi, rdi
    xor     rsi, rsi
    call    gtk_init

    ; window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
    xor     rdi, rdi
    call    gtk_window_new
    mov     [window], rax

    ; gtk_window_set_title (GTK_WINDOW (window), "Window");
    mov     rdi, rax
    mov     rsi, strWindow
    call    gtk_window_set_title

    ; g_signal_connect (window, "destroy", G_CALLBACK (gtk_main_quit), NULL);
    mov     rdi, [window]
    mov     rsi, destroy
    mov     rdx, gtk_main_quit
    xor     rcx, rcx
    xor     r8, r8
    xor     r9, r9
    call    g_signal_connect_data

    ; gtk_widget_show (window);
    mov     rdi, [window]
    call    gtk_widget_show

    ; gtk_main ();
    call    gtk_main

    mov     rax, 60 ; SYS_EXIT
    xor     rdi, rdi
    syscall

And here is the same code meant to be linked against the standard library

%include "gtk.inc"
%include "glib.inc"

global main

SECTION .data    
destroy         db "destroy", 0     ; const gchar*
strWindow       db "Window", 0              ; const gchar*

SECTION .bss
window         resq 1 ; GtkWindow *

SECTION .text    
main:
    push    rbp
    mov     rbp, rsp

    ; gtk_init (&argc, &argv);
    xor     rdi, rdi
    xor     rsi, rsi
    call    gtk_init

    ; window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
    xor     rdi, rdi
    call    gtk_window_new
    mov     [window], rax

    ; gtk_window_set_title (GTK_WINDOW (window), "Window");
    mov     rdi, rax
    mov     rsi, strWindow
    call    gtk_window_set_title

    ; g_signal_connect (window, "destroy", G_CALLBACK (gtk_main_quit), NULL);
    mov     rdi, [window]
    mov     rsi, destroy
    mov     rdx, gtk_main_quit
    xor     rcx, rcx
    xor     r8, r8
    xor     r9, r9
    call    g_signal_connect_data

    ; gtk_widget_show (window);
    mov     rdi, [window]
    call    gtk_widget_show

    ; gtk_main ();
    call    gtk_main

    pop     rbp
    ret

Both applications create a GtkWindow. However, the two behave differently when the window is closed. The former leads to a zombie process and I need to press Ctrl+C. The latter exhibits the expected behaviour, i.e. the application terminates as soon as the window is closed.

My feeling is that the standard lib is performing some essential operations that I am neglecting in the first code sample, but I can't tell what it is.

So my question is: what's missing in the first code sample?

回答1:

Thanks @MichaelPetch for this idea which explains all the observed symptoms perfectly:

If gtk_main leaves any threads running when it returns, the most important difference between your two programs is that eax=60/syscall only exits the current thread. See the documentation in the _exit(2) man page, which points out that glibc's _exit() wrapper function has used exit_group since glibc2.3.

exit_group(2) is eax=231 / syscall in the x86-64 ABI. This is what the CRT startup/cleanup code runs when main() returns.

You can see this by using strace ./a.out on both versions.


This surprised me at least: A process where the initial thread has exited, but other threads are still running, is shown as a zombie. I tried it on my own desktop (see the end of this answer for build commands and extern declarations so you don't need gtk.inc), and you really do get a process that's reported as a zombie, but that you can ctrl-c to kill the other threads that gtk leaves running when gtk_main returns.

./thread-exit &   # or in the foreground, and do the following commands in another shell
[1] 20592

$ ps m -LF -p $(pidof thread-exit)
UID        PID  PPID   LWP  C NLWP    SZ   RSS PSR STIME TTY      STAT   TIME CMD
peter    20592  7749     -  0    3 109031 21920  - 06:28 pts/12   -      0:00 ./thread-exit
peter        -     - 20592  0    -     -     -   0 06:28 -        Sl     0:00 -
peter        -     - 20593  0    -     -     -   0 06:28 -        Sl     0:00 -
peter        -     - 20594  0    -     -     -   0 06:28 -        Sl     0:00 -

Then close the window: the process doesn't exit, and still has two threads running + 1 zombie.

$ ps m -LF -p $(pidof thread-exit)
UID        PID  PPID   LWP  C NLWP    SZ   RSS PSR STIME TTY      STAT   TIME CMD
peter    20592  7749     -  0    3     0     0   - 06:28 pts/12   -      0:00 [thread-exit] <defunct>
peter        -     - 20592  0    -     -     -   0 06:28 -        Zl     0:00 -
peter        -     - 20593  0    -     -     -   0 06:28 -        Sl     0:00 -
peter        -     - 20594  0    -     -     -   0 06:28 -        Sl     0:00 -

I'm not sure if ps m -LF is the best command for this, but it seems to work. It indicates that only the main thread has exited after you close the window, and 2 other threads are still running. You can even look at /proc/$(pidof thread-exit)/task directly, instead of using ps to do that for you.


re: comments about not wanting to link libc:

Avoiding the glibc's CRT startup / cleanup (by defining _start instead of _main) isn't the same thing as avoiding libc. Your code doesn't call any libc functions directly, but libgtk does. ldd /usr/lib/x86_64-linux-gnu/libgtk-3.so.0 shows that libgtk depends on libc, so the dynamic linker will map libc into your process anyway. In fact, ldd on your own program says that, even if you don't put -lc on the linker command line directly.

So you could just link libc and call exit(3) from your _start.

See this Q&A for info on building static vs. dynamic binaries that link libc or not and define _start or main, with NASM or gas.


Side-note: the version that defines main doesn't need to make a stack frame with rbp.

If you leave out the push rbp / mov rbp, rsp, you still have to do something to align the stack before the call, but it can be push rax, or still push rbp if you want to be confusing. So:

main:
    push    rax              ; align the stack
    ...
    call    gtk_widget_show

    pop     rax              ; restore stack to function-entry state
    jmp     gtk_main         ; optimized tail-call

If you want to keep the frame-pointer stuff, you could still do the tail call, but pop rbp / jmp gtk_main.


PS: for those who want to try it themselves, this change lets you build it without having to go looking for for a gtk.inc:

;%include "gtk.inc"
;%include "glib.inc"

extern gtk_init
extern gtk_window_new
extern g_signal_connect_data
extern gtk_window_set_title
extern gtk_widget_show
extern gtk_main
extern gtk_main_quit

Build with:

yasm -felf64 -Worphan-labels -gdwarf2 thread-exit.asm &&
gcc -nostdlib -o thread-exit thread-exit.o $(pkg-config --libs gtk+-3.0)