Suppose we have:
char* p;
int x;
As recently discussed in another question, arithmetic including comparison operations on invalid pointers can generate unexpected behavior in gcc linux x86-64 C++. This new question is specifically about the expression (p+x)-x
: can it generate unexpected behavior (i.e., result not beingp
) in any existing GCC version running on x86-64 linux?
Note that this question is just about pointer arithmetic; there is absolutely no intention to access the location designated by *(p+x)
, which obviously would be unpredictable in general.
The practical interest here is non-zero-based arrays. Note that (p+x)
and the subtraction by x
happen in different places in the code in these applications.
If recent GCC versions on x86-64 can be shown to never generate unexpected behavior for (p+x)-x
then these versions can be certified for non-zero-based arrays, and future versions generating unexpected behavior could be modified or configured to support this certification.
UPDATE
For the practical case described above, we could also assume p
itself is a valid pointer and p != NULL
.
You do not understand what "undefined behavior" is, and I cannot blame you, given that it is often poorly explained. This is how the standard defines undefined behavior, section 3.27 in intro.defs:
That's it. Nothing less, nothing more. The standard can be thought as a series of constraints for compiler vendors to follow when generating valid programs. When there's undefined behavior, all bets are off.
Some people say that undefined behavior can lead to your program spawning dragons or reformatting your hard drive, but I find that to be a bit of a strawman. More realistically, something like going past the ends of the bounds of an array can result in a seg fault (due to triggering a page fault). Sometimes undefined behavior allows compilers to make optimizations that can change the behavior of your program in unexpected ways, since there's nothing saying the compiler can't.
The point is that compilers not "generate undefined behavior". Undefined behavior exists in your program.
Then it would be a non-standard extension and one would expect it to be documented. I also highly doubt that such a feature would be in high demand given that it would not only allow people to write unsafe code, but it would be extremely hard to generate portable programs for.
Here’s a list of gcc extensions. https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html
There is an extension for pointer arithmetic. Gcc allows performing pointer arithmetic on void pointers. (Not the extension you’re asking about.)
So, gcc treats the behavior for the pointer arithmetic you’re asking about as undefined under the same conditions as described in the language standard.
You can look through there and see if there is anything I missed that’s relevant to your question.
Yes, for gcc5.x and later specifically, that specific expression is optimized very early to just
p
, even with optimization disabled, regardless of any possible runtime UB.This happens even with a static array and compile-time constant size.
gcc -fsanitize=undefined
doesn't insert any instrumentation to look for it either. Also no warnings at-Wall -Wextra -Wpedantic
Using
gcc -dump-tree-original
to dump its internal representation of program logic before any optimization passes shows that this optimization happened even before that in gcc5.x and newer. (And happens even at-O0
).That's from the Godbolt compiler explorer with gcc8.3 with
-O0
.The x86-64 asm output is just:
-O3
output is of course justmov rax, rdi
gcc4.9 and earlier only do this optimization in a later pass, and not at
-O0
: the tree dump still includes the subtract, and the x86-64 asm isThis does line up with the
-fdump-tree-original
output:If
x*4
overflows, you'll still get the right answer. In practice I can't think of a way to write a function that would lead to the UB causing an observable change in behaviour.As part of a larger function, a compiler would be allowed to infer some range info, like that
p[x]
is part of the same object asp[0]
, so reading memory in between / out that far is allowed and won't segfault. e.g. allowing auto-vectorization of a search loop.But I doubt that gcc even looks for that, let alone takes advantage of it.
(Note that your question title was specific to gcc targeting x86-64 on Linux, not about whether similar things are safe in gcc, e.g. if done in separate statements. I mean yes probably safe in practice, but won't be optimized away almost immediately after parsing. And definitely not about C++ in general.)
I highly recommend not doing this. Use
uintptr_t
to hold pointer-like values that aren't actual valid pointers. like you're doing in the updates to your answer on C++ gcc extension for non-zero-based array pointer allocation?.