This code :
int *p = nullptr;
p++;
cause undefined behaviour as it was discussed in Is incrementing a null pointer well-defined?
But when explaining fellows why they should avoid UB, besides saying it is bad because UB means that anything could happen, I like to have some example demonstating it. I have tons of them for access to an array past the limits but I could not find a single one for that.
I even tried
int testptr(int *p) {
intptr_t ip;
int *p2 = p + 1;
ip = (intptr_t) p2;
if (p == nullptr) {
ip *= 2;
}
else {
ip *= -2;
} return (int) ip;
}
in a separate compilation unit hoping that an optimizing compiler would skip the test because when p
is null, line int *p2 = p + 1;
is UB, and compilers are allowed to assume that code does not contain UB.
But gcc 4.8.2 (I have no useable gcc 4.9) and clang 3.4.1 both answer a positive value !
Could someone suggest some more clever code or another optimizing compiler to exhibit a problem when incrementing a null pointer ?
How about this example:
At face value, this code says: 'If there are any command line arguments, initialise
p
to point to the first member ofa[]
, otherwise initialise it to null. Then increment it, then decrement it, and tell me if it's null.'On the face of it this should return '0' (indicating
p
is non-null) if we supply a command line argument, and '1' (indicating null) if we don't. Note that at no point do we dereferencep
, and if we supply an argument thenp
always points within the bounds ofa[]
.Compiling with the command line
clang -S --std=c++11 -O2 nulltest.cpp
(Cygwin clang 3.5.1) yields the following generated code:This code says 'return 0'. It doesn't even bother to check the number of command line args.
(And interestingly, commenting out the decrement has no effect on the generated code.)
An ideal C implementation would, when not being used for kinds of systems programming that would require using pointers which the programmer knew to have meaning but the compiler did not, ensure that every pointer was either valid or was recognizable as invalid, and would trap any time code either tried to dereference an invalid pointer (including null) or used illegitimate means to created something that wasn't a valid pointer but might be mistaken for one. On most platforms, having generated code enforce such a constraint in all situations would be quite expensive, but guarding against many common erroneous scenarios is much cheaper.
On many platforms, it is relatively inexpensive to have the compiler generate for
*foo=23
code equivalent toif (!foo) NULL_POINTER_TRAP(); else *foo=23;
. Even primitive compilers in the 1980s often had an option for that. The usefulness of such trapping may be largely lost, however, if compilers allow a null pointer to be incremented in such a fashion that it is no longer recognizable as a null pointer. Consequently, a good compiler should, when error-trapping is enabled, replacefoo++;
withfoo = (foo ? foo+1 : (NULL_POINTER_TRAP(),0));
. Arguably, the real "billion dollar mistake" wasn't inventing null pointers, but lay rather with the fact that some compilers would trap direct null-pointer stores, but would not trap null-pointer arithmetic.Given that an ideal compiler would trap on an attempt to increment a null pointer (many compilers fail to do so for reasons of performance rather than semantics), I can see no reason why code should expect such an increment to have meaning. In just about any case where a programmer might expect a compiler to assign a meaning to such a construct [e.g.
((char*)0)+5
yielding a pointer to address 5], it would be better for the programmer to instead use some other construct to form the desired pointer (e.g.((char*)5)
).This is just for completion, but the link proposed by @HansPassant in comment really deserves to be cited as an answer.
All references are here, following is just some extracts
This article is about a new memory-safe interpretation of the C abstract machine that provides stronger protection to benefit security and debugging ... [Writers] demonstrate that it is possible for a memory-safe implementation of C to support not just the C abstract machine as specified, but a broader interpretation that is still compatible with existing code. By enforcing the model in hardware, our implementation provides memory safety that can be used to provide high-level security properties for C ...
[Implementation] memory capabilities are represented as the triplet (base, bound, permissions), which is loosely packed into a 256-bit value. Here base provides an offset into a virtual address region, and bound limits the size of the region accessed ... Special capability load and store instructions allow capabilities to be spilled to the stack or stored in data structures, just like pointers ... with the caveat that pointer subtraction is not allowed.
The addition of permissions allows capabilities to be tokens granting certain rights to the referenced memory. For example, a memory capability may have permissions to read data and capabilities, but not to write them (or just to write data but not capabilities). Attempting any of the operations that is not permitted will cause a trap.
[The] results confirm that it is possible to retain the strong semantics of a capability-system memory model (which provides non-bypassable memory protection) without sacrificing the advantages of a low-level language.
(emphasize mine)
That means that even if it is not an operational compiler, researches exists to build one that could trap on incorrect pointer usages, and have already been published.
Extracted from http://c-faq.com/null/machexamp.html:
Given that those null pointers have a weird bit pattern representation in the quoted machines, the code you put:
would not give the value most people would expect (
0 + sizeof(*p)
).Instead you would have a value based on your machine specific
nullptr
bit pattern (except if the compiler has a special case for null pointer arithmetic but since that is not mandated by the standard you'll most likely face Undefined Behaviour with "visible" concrete effect).