This question already has an answer here:
-
Decrementing a pointer out of bounds; incrementing it into bounds [duplicate]
3 answers
-
Why is out-of-bounds pointer arithmetic undefined behaviour?
7 answers
I found this problem in a very large application, have made an SSCCE from it. I don't know whether the code has undefined behavior or -O2
breaks it.
When compiling it with gcc a.c -o a.exe -O2 -Wall -Wextra -Werror
it prints 5.
But it prints 25 when compiling without -O2
(eg -O1
) or uncommenting one of the 2 commented lines (prevent inlining).
#include <stdio.h>
#include <stdlib.h>
// __attribute__((noinline))
int f(int* todos, int input) {
int* cur = todos-1; // fixes the ++ at the beginning of the loop
int result = input;
while(1) {
cur++;
int ch = *cur;
// printf("(%i)\n", ch);
switch(ch) {
case 0:;
goto end;
case 1:;
result = result*result;
break;
}
}
end:
return result;
}
int main() {
int todos[] = { 1, 0}; // 1:square, 0:end
int input = 5;
int result = f(todos, input);
printf("=%i\n", result);
printf("end\n");
return 0;
}
Is GCC's option -O2
breaking this small program or do I have undefined behavior somewhere?
int* cur = todos-1;
invokes undefined behavior. todos - 1
is an invalid pointer address.
Emphasis mine:
(C99, 6.5.6p8) "If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."
In supplement to @ouah's answer, this explains what the compiler is doing.
Generated assembler for reference:
400450: 48 83 ec 18 sub $0x18,%rsp
400454: be 05 00 00 00 mov $0x5,%esi
400459: 48 8d 44 24 fc lea -0x4(%rsp),%rax
40045e: c7 44 24 04 00 00 00 movl $0x0,0x4(%rsp)
400465: 00
400466: 48 83 c0 04 add $0x4,%rax
40046a: 8b 10 mov (%rax),%edx
However if I add a printf
in main()
:
400450: 48 83 ec 18 sub $0x18,%rsp
400454: bf 84 06 40 00 mov $0x400684,%edi
400459: 31 c0 xor %eax,%eax
40045b: 48 89 e6 mov %rsp,%rsi
40045e: c7 04 24 01 00 00 00 movl $0x1,(%rsp)
400465: c7 44 24 04 00 00 00 movl $0x0,0x4(%rsp)
40046c: 00
40046d: e8 ae ff ff ff callq 400420 <printf@plt>
400472: 48 8d 44 24 fc lea -0x4(%rsp),%rax
400477: be 05 00 00 00 mov $0x5,%esi
40047c: 48 83 c0 04 add $0x4,%rax
400480: 8b 10 mov (%rax),%edx
Specifically (in the printf
version), these two instructions populate the todo
array
40045e: c7 04 24 01 00 00 00 movl $0x1,(%rsp)
400465: c7 44 24 04 00 00 00 movl $0x0,0x4(%rsp)
This is conspicuously missing from the non-printf
version, which for some reason only assigns the second element:
40045e: c7 44 24 04 00 00 00 movl $0x0,0x4(%rsp)