Why does this code generate much more assembly tha

2020-06-02 23:57发布

example::f1: push rbp mov rbp, rsp mov al, sil mov cl, dil mov dl, cl xor dl, -1 test dl, 1 mov byte ptr [rbp - 3], al mov byte ptr [rbp - 4], cl jne .LBB0_1 jmp .LBB0_3 .LBB0_1: mov byte ptr [rbp - 2], 1 jmp .LBB0_4 .LBB0_2: mov byte ptr [rbp - 2], 0 jmp .LBB0_4 .LBB0_3: mov al, byte ptr [rbp - 4] test al, 1 jne .LBB0_7 jmp .LBB0_6 .LBB0_4: mov al, byte ptr [rbp - 2] and al, 1 movzx eax, al pop rbp ret .LBB0_5: mov byte ptr [rbp - 1], 1 jmp .LBB0_8 .LBB0_6: mov byte ptr [rbp - 1], 0 jmp .LBB0_8 .LBB0_7: mov al, byte ptr [rbp - 3] test al, 1 jne .LBB0_5 jmp .LBB0_6 .LBB0_8: test byte ptr [rbp - 1], 1 jne .LBB0_1 jmp .LBB0_2

Compiling with the compiler flag -O (and with an added pub), I get this output (Link to Godbolt):

push    rbp
mov     rbp, rsp
xor     dil, 1
or      dil, sil
mov     eax, edi
pop     rbp
ret

A few things:

Why is it still longer than the C++ version?

The Rust version is exactly three instructions longer:
```
push    rbp
mov     rbp, rsp
[...]
pop     rbp
```
These are instructions to manage the so called frame pointer or base pointer (rbp). This is mainly required to get nice stack traces. If you disable it for the C++ version via -fno-omit-frame-pointer, you get the same result. Note that this uses g++ instead of clang++ since I haven't found a comparable option for the clang compiler.
Why doesn't Rust omit frame pointer?

Actually, it does. But Godbolt adds an option to the compiler to preserve frame pointer. You can read more about why this is done here. If you compile your code locally with rustc -O --crate-type=lib foo.rs --emit asm -C "llvm-args=-x86-asm-syntax=intel", you get this output:
```
f1:
    xor dil, 1
    or  dil, sil
    mov eax, edi
    ret
```
Which is exactly the output of your C++ version.

You can "undo" what Godbolt does by passing -C debuginfo=0 to the compiler.
Why -O instead of --release?

Godbolt uses rustc directly instead of cargo. The --release flag is a flag for cargo. To enable optimizations on rustc, you need to pass -O or -C opt-level=3 (or any other level between 0 and 3).

回答2:

Compiling with -C opt-level=3 in godbolt gives:

example::f1:
  push rbp
  mov rbp, rsp
  xor dil, 1
  or dil, sil
  mov eax, edi
  pop rbp
  ret

Which looks comparable to the C++ version. See Lukas Kalbertodt's answer for more explanation.

Note: I had to make the function pub extern to stop the compiler optimising it to nothing, as it is unused.