While fiddling with simple C code, I noticed something strange. Why does ICC produces incl %eax
in assembly code generated for increment instead of addl $1, %eax
? GCC behaves as expected though, using add
.
Example code (-O3
used on both GCC and ICC)
int A, B, C, D, E;
void foo()
{
A = B + 1;
B = 0;
C++;
D++;
D++;
E += 2;
}
Result on ICC
L__routine_start_foo_0:
foo:
movl B(%rip), %eax #5.13
movl D(%rip), %edx #8.9
incl %eax #5.17
movl E(%rip), %ecx #10.9
addl $2, %edx #9.9
addl $2, %ecx #10.9
movl %eax, A(%rip) #5.9
movl $0, B(%rip) #6.9
incl C(%rip) #7.9
movl %edx, D(%rip) #9.9
movl %ecx, E(%rip) #10.9
ret
For example, see here.
As such, I'm wondering - is this an intended feature, a bug or some quirk resulting from some specific setting? If add
is (supposedly) better due to flags update or efficiency (which is the conclusion based on the links below) - why does ICC use inc
?
Related:
Relative performance of x86 inc vs. add instruction
Is ADD 1 really faster than INC ? x86
Note:
I'm asking this question explicitly because none of the questions I found or was directed to on SO does explain this behaviour. My previous question concerning this matter got closed because, supposedly, it's trivial and has been answered. I don't find it trivial. I didn't find an answer in all of the links and answers given. It's not another "how to plug my mouse into my PC" problem. All of the questions explain why add
is/could be better on new x86 processors or why GCC uses it, but none concerns ICC.
Any insight on ICC design choices would be also very welcome.
PS I don't consider "it does it because it does" a valid answer.
It is not unreasonable to assume at this point that incl was selected as it takes only one byte (0x40) instead of three (0x83 0xc0 0x01).