Is GCC's option -O2 breaking this small progra

2019-02-06 08:19发布

I found this problem in a very large application, have made an SSCCE from it. I don't know whether the code has undefined behavior or -O2 breaks it.

When compiling it with gcc a.c -o a.exe -O2 -Wall -Wextra -Werror it prints 5.

But it prints 25 when compiling without -O2 (eg -O1) or uncommenting one of the 2 commented lines (prevent inlining).

#include <stdio.h>
#include <stdlib.h>
// __attribute__((noinline)) 
int f(int* todos, int input) {
    int* cur = todos-1; // fixes the ++ at the beginning of the loop
    int result = input;
    while(1) {
        cur++;
        int ch = *cur;
        // printf("(%i)\n", ch);
        switch(ch) {
            case 0:;
                goto end;
            case 1:;
                result = result*result;
            break;
        }
    }
    end:
    return result;
}
int main() {
    int todos[] = { 1, 0}; // 1:square, 0:end
    int input = 5;
    int result = f(todos, input);
    printf("=%i\n", result);
    printf("end\n");
    return 0;
}

Is GCC's option -O2 breaking this small program or do I have undefined behavior somewhere?

2条回答
在下西门庆
2楼-- · 2019-02-06 08:51
int* cur = todos-1;

invokes undefined behavior. todos - 1 is an invalid pointer address.

Emphasis mine:

(C99, 6.5.6p8) "If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."

查看更多
叼着烟拽天下
3楼-- · 2019-02-06 09:02

In supplement to @ouah's answer, this explains what the compiler is doing.

Generated assembler for reference:

  400450:       48 83 ec 18             sub    $0x18,%rsp
  400454:       be 05 00 00 00          mov    $0x5,%esi
  400459:       48 8d 44 24 fc          lea    -0x4(%rsp),%rax
  40045e:       c7 44 24 04 00 00 00    movl   $0x0,0x4(%rsp)
  400465:       00 
  400466:       48 83 c0 04             add    $0x4,%rax
  40046a:       8b 10                   mov    (%rax),%edx

However if I add a printf in main():

  400450:       48 83 ec 18             sub    $0x18,%rsp
  400454:       bf 84 06 40 00          mov    $0x400684,%edi
  400459:       31 c0                   xor    %eax,%eax
  40045b:       48 89 e6                mov    %rsp,%rsi
  40045e:       c7 04 24 01 00 00 00    movl   $0x1,(%rsp)
  400465:       c7 44 24 04 00 00 00    movl   $0x0,0x4(%rsp)
  40046c:       00 
  40046d:       e8 ae ff ff ff          callq  400420 <printf@plt>
  400472:       48 8d 44 24 fc          lea    -0x4(%rsp),%rax
  400477:       be 05 00 00 00          mov    $0x5,%esi
  40047c:       48 83 c0 04             add    $0x4,%rax
  400480:       8b 10                   mov    (%rax),%edx

Specifically (in the printf version), these two instructions populate the todo array

  40045e:       c7 04 24 01 00 00 00    movl   $0x1,(%rsp)
  400465:       c7 44 24 04 00 00 00    movl   $0x0,0x4(%rsp)

This is conspicuously missing from the non-printf version, which for some reason only assigns the second element:

  40045e:       c7 44 24 04 00 00 00    movl   $0x0,0x4(%rsp)
查看更多
登录 后发表回答