Why does ICC produce “inc” instead of “add” in ass

2019-07-15 17:10发布

问题:

While fiddling with simple C code, I noticed something strange. Why does ICC produces incl %eax in assembly code generated for increment instead of addl $1, %eax? GCC behaves as expected though, using add.

Example code (-O3 used on both GCC and ICC)

int A, B, C, D, E;

void foo()
{
    A = B + 1;
    B = 0;
    C++;
    D++;
    D++;
    E += 2;
}

Result on ICC

L__routine_start_foo_0:
foo:
    movl      B(%rip), %eax                                 #5.13
    movl      D(%rip), %edx                                 #8.9
    incl      %eax                                          #5.17
    movl      E(%rip), %ecx                                 #10.9
    addl      $2, %edx                                      #9.9
    addl      $2, %ecx                                      #10.9
    movl      %eax, A(%rip)                                 #5.9
    movl      $0, B(%rip)                                   #6.9
    incl      C(%rip)                                       #7.9
    movl      %edx, D(%rip)                                 #9.9
    movl      %ecx, E(%rip)                                 #10.9
    ret   

For example, see here.

As such, I'm wondering - is this an intended feature, a bug or some quirk resulting from some specific setting? If add is (supposedly) better due to flags update or efficiency (which is the conclusion based on the links below) - why does ICC use inc?

Related:

Relative performance of x86 inc vs. add instruction

Is ADD 1 really faster than INC ? x86

GCC doesn't make use of inc

Note:

I'm asking this question explicitly because none of the questions I found or was directed to on SO does explain this behaviour. My previous question concerning this matter got closed because, supposedly, it's trivial and has been answered. I don't find it trivial. I didn't find an answer in all of the links and answers given. It's not another "how to plug my mouse into my PC" problem. All of the questions explain why add is/could be better on new x86 processors or why GCC uses it, but none concerns ICC.

Any insight on ICC design choices would be also very welcome.

PS I don't consider "it does it because it does" a valid answer.

回答1:

It is not unreasonable to assume at this point that incl was selected as it takes only one byte (0x40) instead of three (0x83 0xc0 0x01).