Is the difference of two non-void pointer variables defined (per C99 and/or C++98) if they are both NULL
valued?
For instance, say I have a buffer structure that looks like this:
struct buf {
char *buf;
char *pwrite;
char *pread;
} ex;
Say, ex.buf
points to an array or some malloc'ed memory. If my code always ensures that pwrite
and pread
point within that array or one past it, then I am fairly confident that ex.pwrite - ex.pread
will always be defined. However, what if pwrite
and pread
are both NULL. Can I just expect subtracting the two is defined as (ptrdiff_t)0
or does strictly compliant code need to test the pointers for NULL? Note that the only case I am interested in is when both pointers are NULL (which represents a buffer not initialized case). The reason has to do with a fully compliant "available" function given the preceding assumptions are met:
size_t buf_avail(const struct s_buf *b)
{
return b->pwrite - b->pread;
}
In C99, it's technically undefined behavior. C99 §6.5.6 says:
7) For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
[...]
9) When two pointers are subtracted, both shall point to elements of the same array object,
or one past the last element of the array object; the result is the difference of the
subscripts of the two array elements. [...]
And §6.3.2.3/3 says:
An integer constant expression with the value 0, or such an expression cast to type
void *
, is called a null pointer constant.55) If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
So since a null pointer is unequal to any object, it violates the preconditions of 6.5.6/9, so it's undefined behavior. But in practicality, I'd be willing to bet that pretty much every compiler will return a result of 0 without any ill side effects.
In C89, it's also undefined behavior, though the wording of the standard is slightly different.
C++03, on the other hand, does have defined behavior in this instance. The standard makes a special exception for subtracting two null pointers. C++03 §5.7/7 says:
If the value 0 is added to or subtracted from a pointer value, the result compares equal to the original pointer value. If two pointers point to the same object or both point one past the end of the same array or both are null, and the two pointers are subtracted, the result compares equal to the value 0 converted to the type ptrdiff_t
.
C++11 (as well as the latest draft of C++14, n3690) have identical wording to C++03, with just the minor change of std::ptrdiff_t
in place of ptrdiff_t
.
I found this in the C++ standard (5.7 [expr.add] / 7):
If two pointers [...] both are null, and the two pointers are
subtracted, the result compares equal to the value 0 converted to the
type std::ptrdiff_t
As others have said, C99 requires addition/subtraction between 2 pointers be of the same array object. NULL does not point to a valid object which is why you cannot use it in subtraction.
Edit: This answer is only valid for C, I didn't see the C++ tag when I answered.
No, pointer arithmetic is only allowed for pointers that point within the same object. Since by definition of the C standard null pointers don't point to any object, this is undefined behavior.
(Although, I'd guess that any reasonable compiler will return just 0
on it, but who knows.)
The C Standard does not impose any requirements on the behavior in this case, but many implementations do specify the behavior of pointer arithmetic in many cases beyond the bare minimums required by the Standard, including this one.
On any conforming C implementation, and nearly all (if not all) implementations of C-like dialects, the following guarantees will hold for any pointer p
such that either *p
or *(p-1)
identifies some object:
- For any integer value
z
that equals zero, The pointer values (p+z)
and (p-z)
will be equivalent in every way to p
, except that they will only be constant if both p
and z
are constant.
- For any
q
which is equivalent to p
, the expressions p-q
and q-p
will both yield zero.
Having such guarantees hold for all pointer values, including null, may eliminate the need for some null checks in user code. Further, on most platforms, generating code that upholds such guarantees for all pointer values without regard for whether they are null would be simpler and cheaper than treating nulls specially. Some platforms, however, may trap on attempts to perform pointer arithmetic with null pointers, even when adding or subtracting zero. On such platforms, the number of compiler-generated null checks that would have to be added to pointer operations to uphold the guarantee would in many cases vastly exceed the number of user-generated null checks that could be omitted as a result.
If there were an implementation where the cost of upholding the guarantees would be great, but few if any programs would receive any benefit from them, it would make sense to allow it to trap "null+zero" computations, and require that user code for such an implementation include the manual null checks that the guarantees could have made unnecessary. Such an allowance was not expected to affect the other 99.44% of implementations, where the value of upholding the guarantees would exceed the cost. Such implementations should uphold such guarantees, but their authors shouldn't need the authors of the Standard to tell them that.
The authors of C++ have decided that conforming implementations must uphold the above guarantees at any cost, even on platforms where they could substantially degrade the performance of pointer arithmetic. They judged that the value of the guarantees even on platforms where they would be expensive to uphold would exceed the cost. Such an attitude may have been affected by a desire to treat C++ as a higher-level language than C. A C programmer could be expected to know when a particular target platform would handle cases like (null+zero) in unusual fashion, but C++ programmers weren't expected to concern themselves with such things. Guaranteeing a consistent behavioral model was thus judged to be worth the cost.
Of course, nowadays questions about what is "defined" seldom have anything to do with what behaviors a platform can support. Instead, it is now fashionable for compilers to--in the name of "optimization"--require that programmers manually write code to handle corner cases which platforms would previously have handled correctly. For example, if code which is supposed to output n
characters starting at address p
is written as:
void out_characters(unsigned char *p, int n)
{
unsigned char *end = p+n;
while(p < end)
out_byte(*p++);
}
older compilers would generate code that would reliably output nothing, with
no side-effect, if p==NULL and n==0, with no need to special-case n==0. On
newer compilers, however, one would have to add extra code:
void out_characters(unsigned char *p, int n)
{
if (n)
{
unsigned char *end = p+n;
while(p < end)
out_byte(*p++);
}
}
which an optimizer may or may not be able to get rid of. Failing to include the extra code may cause some compilers to figure that since p "can't possibly be null", any subsequent null pointer checks may be omitted, thus causing the code to break in a spot unrelated to the actual "problem".