When I compile a switch statement with optimization in GCC, it sets up a jump table like this,
(fcn) sym.foo 148
sym.foo (unsigned int arg1);
; arg unsigned int arg1 @ rdi
0x000006e0 83ff06 cmp edi, 6 ; arg1
0x000006e3 0f87a7000000 ja case.default.0x790
0x000006e9 488d156c0100. lea rdx, [0x0000085c]
0x000006f0 89ff mov edi, edi
0x000006f2 4883ec08 sub rsp, 8
0x000006f6 486304ba movsxd rax, dword [rdx + rdi*4]
0x000006fa 4801d0 add rax, rdx ; '('
;-- switch.0x000006fd:
0x000006fd ffe0 jmp rax ; switch table (7 cases) at 0x85c
Is the MOVSXD
and ADD
the best way to do that,
movsxd rax, dword [rdx + rdi*4]
add rax, rdx
Isn't that the same as using LEA
with displacement
lea rax, [rdx + rdi*4 + rdx]
It occurs to me that I probably don't understand what's going on here. RDX
seems to be the start off the start of the jump table. RDI
is the incoming argument to the switch statement. Why are we adding RDX
twice though?
This is the switch statement I was compiling with -O3
,
int foo (int x) {
switch(x) {
//case 0: puts("\nzero"); break;
case 1: puts("\none"); break;
case 2: puts("\ntwo"); break;
case 3: puts("\nthree"); break;
case 4: puts("\nfour"); break;
case 5: puts("\nfive"); break;
case 6: puts("\nsix"); break;
}
return 0;
}
It could be done with just an
add
with aqword
memory operand. Of course the downside is that it makes the table twice as big.No,
lea
does not access memory.The first time it is used as the base of the table to index into it. The table holds addresses relative to itself, so adding RDX to the value from the table creates an absolute address.
By the way this could easily be improved:
A self-mov cannot be mov-eliminated on current architectures, so it would be better to mov to some other register.
GCC is using relative displacements in its jump table (relative to the base of the table), instead of absolute addresses. So the jump table itself is position-independent, and doesn't need fixups when it's relocated, e.g. as part of loading a PIE executable or a PIC shared library.
If you compile with
-fno-pie -no-pie
, gcc might choose to use a table of jump targets withjmp [table + rdi*8]
Targets like x86-64 Linux do support runtime data fixups, so a simple jump table would be possible. But some targets don't support fixups at all, which is why gcc
-fPIC
/-fpie
avoids it entirely. This potential optimization is gcc bug 84011. See discussion there for more.It's unfortunate gcc is using a jump table instead of realizing that the only difference between each case is the data, not code. So really it just needs a table lookup of string pointers. (Which could be done with relative displacements if it wanted to.)
That's a separate missed optimization, which I reported as bug 85585. (That reminds me, I have a followup to that half-written which I should finish and post.)