I wrote a simple C++ function in order to check compiler optimization:
bool f1(bool a, bool b) {
return !a || (a && b);
}
After that I checked the equivalent in Rust:
fn f1(a: bool, b: bool) -> bool {
!a || (a && b)
}
I used godbolt to check the assembler output.
The result of the C++ code (compiled by clang with -O3 flag) is following:
f1(bool, bool): # @f1(bool, bool)
xor dil, 1
or dil, sil
mov eax, edi
ret
And the result of Rust equivalent is much longer:
example::f1:
push rbp
mov rbp, rsp
mov al, sil
mov cl, dil
mov dl, cl
xor dl, -1
test dl, 1
mov byte ptr [rbp - 3], al
mov byte ptr [rbp - 4], cl
jne .LBB0_1
jmp .LBB0_3
.LBB0_1:
mov byte ptr [rbp - 2], 1
jmp .LBB0_4
.LBB0_2:
mov byte ptr [rbp - 2], 0
jmp .LBB0_4
.LBB0_3:
mov al, byte ptr [rbp - 4]
test al, 1
jne .LBB0_7
jmp .LBB0_6
.LBB0_4:
mov al, byte ptr [rbp - 2]
and al, 1
movzx eax, al
pop rbp
ret
.LBB0_5:
mov byte ptr [rbp - 1], 1
jmp .LBB0_8
.LBB0_6:
mov byte ptr [rbp - 1], 0
jmp .LBB0_8
.LBB0_7:
mov al, byte ptr [rbp - 3]
test al, 1
jne .LBB0_5
jmp .LBB0_6
.LBB0_8:
test byte ptr [rbp - 1], 1
jne .LBB0_1
jmp .LBB0_2
I also tried with -O
option but the output is empty (deleted unused function).
I intentionally am NOT using any library in order to keep output clean. Please notice that both clang
and rustc
use LLVM as a backend. What explains this huge output difference? And if it is only disabled-optimize-switch problem, how can I see optimized output from rustc
?
Compiling with
-C opt-level=3
in godbolt gives:Which looks comparable to the C++ version. See Lukas Kalbertodt's answer for more explanation.
Note: I had to make the function
pub extern
to stop the compiler optimising it to nothing, as it is unused.It doesn't (the actual difference is much smaller than shown in the question). I'm surprised nobody checked the C++ output:
godbolt C++ x64 clang 4.0, no compiler options
godbolt Rust 1.18, no compiler options
To get the same asm code, you need to disable debug info - this will remove the frame pointers pushes.
-C opt-level=3 -C debuginfo=0
(https://godbolt.org/g/vdhB2f)Compiling with the compiler flag
-O
(and with an addedpub
), I get this output (Link to Godbolt):A few things:
Why is it still longer than the C++ version?
The Rust version is exactly three instructions longer:
These are instructions to manage the so called frame pointer or base pointer (
rbp
). This is mainly required to get nice stack traces. If you disable it for the C++ version via-fno-omit-frame-pointer
, you get the same result. Note that this usesg++
instead ofclang++
since I haven't found a comparable option for the clang compiler.Why doesn't Rust omit frame pointer?
Actually, it does. But Godbolt adds an option to the compiler to preserve frame pointer. You can read more about why this is done here. If you compile your code locally with
rustc -O --crate-type=lib foo.rs --emit asm -C "llvm-args=-x86-asm-syntax=intel"
, you get this output:Which is exactly the output of your C++ version.
You can "undo" what Godbolt does by passing
-C debuginfo=0
to the compiler.Why
-O
instead of--release
?Godbolt uses
rustc
directly instead ofcargo
. The--release
flag is a flag forcargo
. To enable optimizations onrustc
, you need to pass-O
or-C opt-level=3
(or any other level between 0 and 3).