Is GCC's option -O2 breaking this small progra

2019-02-06 08:29发布

问题:

This question already has an answer here:

  • Decrementing a pointer out of bounds; incrementing it into bounds [duplicate] 3 answers
  • Why is out-of-bounds pointer arithmetic undefined behaviour? 7 answers

I found this problem in a very large application, have made an SSCCE from it. I don't know whether the code has undefined behavior or -O2 breaks it.

When compiling it with gcc a.c -o a.exe -O2 -Wall -Wextra -Werror it prints 5.

But it prints 25 when compiling without -O2 (eg -O1) or uncommenting one of the 2 commented lines (prevent inlining).

#include <stdio.h>
#include <stdlib.h>
// __attribute__((noinline)) 
int f(int* todos, int input) {
    int* cur = todos-1; // fixes the ++ at the beginning of the loop
    int result = input;
    while(1) {
        cur++;
        int ch = *cur;
        // printf("(%i)\n", ch);
        switch(ch) {
            case 0:;
                goto end;
            case 1:;
                result = result*result;
            break;
        }
    }
    end:
    return result;
}
int main() {
    int todos[] = { 1, 0}; // 1:square, 0:end
    int input = 5;
    int result = f(todos, input);
    printf("=%i\n", result);
    printf("end\n");
    return 0;
}

Is GCC's option -O2 breaking this small program or do I have undefined behavior somewhere?

回答1:

int* cur = todos-1;

invokes undefined behavior. todos - 1 is an invalid pointer address.

Emphasis mine:

(C99, 6.5.6p8) "If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."



回答2:

In supplement to @ouah's answer, this explains what the compiler is doing.

Generated assembler for reference:

  400450:       48 83 ec 18             sub    $0x18,%rsp
  400454:       be 05 00 00 00          mov    $0x5,%esi
  400459:       48 8d 44 24 fc          lea    -0x4(%rsp),%rax
  40045e:       c7 44 24 04 00 00 00    movl   $0x0,0x4(%rsp)
  400465:       00 
  400466:       48 83 c0 04             add    $0x4,%rax
  40046a:       8b 10                   mov    (%rax),%edx

However if I add a printf in main():

  400450:       48 83 ec 18             sub    $0x18,%rsp
  400454:       bf 84 06 40 00          mov    $0x400684,%edi
  400459:       31 c0                   xor    %eax,%eax
  40045b:       48 89 e6                mov    %rsp,%rsi
  40045e:       c7 04 24 01 00 00 00    movl   $0x1,(%rsp)
  400465:       c7 44 24 04 00 00 00    movl   $0x0,0x4(%rsp)
  40046c:       00 
  40046d:       e8 ae ff ff ff          callq  400420 <printf@plt>
  400472:       48 8d 44 24 fc          lea    -0x4(%rsp),%rax
  400477:       be 05 00 00 00          mov    $0x5,%esi
  40047c:       48 83 c0 04             add    $0x4,%rax
  400480:       8b 10                   mov    (%rax),%edx

Specifically (in the printf version), these two instructions populate the todo array

  40045e:       c7 04 24 01 00 00 00    movl   $0x1,(%rsp)
  400465:       c7 44 24 04 00 00 00    movl   $0x0,0x4(%rsp)

This is conspicuously missing from the non-printf version, which for some reason only assigns the second element:

  40045e:       c7 44 24 04 00 00 00    movl   $0x0,0x4(%rsp)