Printing null pointers with %p is undefined behavi

2020-05-14 04:52发布

问题:

Is it undefined behavior to print null pointers with the %p conversion specifier?

#include <stdio.h>

int main(void) {
    void *p = NULL;

    printf("%p", p);

    return 0;
}

The question applies to the C standard, and not to C implementations.

回答1:

This is one of those weird corner cases where we're subject to the limitations of the English language and inconsistent structure in the standard. So at best, I can make a compelling counter-argument, as it's impossible to prove it :)1


The code in the question exhibits well-defined behaviour.

As [7.1.4] is the basis of the question, let's start there:

Each of the following statements applies unless explicitly stated otherwise in the detailed descriptions that follow: If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, [... other examples ...]) [...] the behavior is undefined. [... other statements ...]

This is clumsy language. One interpretation is that the items in the list are UB for all library functions, unless overridden by the individual descriptions. But the list starts with "such as", indicating that it's illustrative, not exhaustive. For example, it does not mention correct null-termination of strings (critical for the behaviour of e.g. strcpy).

Thus it's clear the intent/scope of 7.1.4 is simply that an "invalid value" leads to UB (unless stated otherwise). We have to look to each function's description to determine what counts as an "invalid value".

Example 1 - strcpy

[7.21.2.3] says only this:

The strcpy function copies the string pointed to by s2 (including the terminating null character) into the array pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.

It makes no explicit mention of null pointers, yet it makes no mention of null terminators either. Instead, one infers from "string pointed to by s2" that the only valid values are strings (i.e. pointers to null-terminated character arrays).

Indeed, this pattern can be seen throughout the individual descriptions. Some other examples:

  • [7.6.4.1 (fenv)] store the current floating-point environment in the object pointed to by envp

  • [7.12.6.4 (frexp)] store the integer in the int object pointed to by exp

  • [7.19.5.1 (fclose)] the stream pointed to by stream

Example 2 - printf

[7.19.6.1] says this about %p:

p - The argument shall be a pointer to void. The value of the pointer is converted to a sequence of printing characters, in an implementation-defined manner.

Null is a valid pointer value, and this section makes no explicit mention that null is a special case, nor that the pointer has to point at an object. Thus it is defined behaviour.


1. Unless a standards author comes forward, or unless we can find something similar to a rationale document that clarifies things.



回答2:

The Short Answer

Yes. Printing null pointers with the %p conversion specifier has undefined behavior. Having said that, I'm unaware of any existing conforming implementation that would misbehave.

The answer applies to any of the C standards (C89/C99/C11).


The Long Answer

The %p conversion specifier expects an argument of type pointer to void, the conversion of the pointer to printable characters is implementation-defined. It doesn't state that a null pointer is expected.

The introduction to the standard library functions states that null pointers as arguments to (standard library) functions are considered to be invalid values, unless it is explicitly stated otherwise.

C99 / C11 §7.1.4 p1

[...] If an argument to a function has an invalid value (such as [...] a null pointer, [...] the behavior is undefined.

Examples for (standard library) functions that expect null pointers as valid arguments:

  • fflush() uses a null pointer for flushing "all streams" (that apply).
  • freopen() uses a null pointer for indicating the file "currently associated" with the stream.
  • snprintf() allows to pass a null pointer when 'n' is zero.
  • realloc() uses a null pointer for allocating a new object.
  • free() allows to pass a null pointer.
  • strtok() uses a null pointer for subsequent calls.

If we take the case for snprintf(), it makes sense to allow passing a null pointer when 'n' is zero, but this is not the case for other (standard library) functions that allow a similar zero 'n'. For example: memcpy(), memmove(), strncpy(), memset(), memcmp().

It is not only specified in the introduction to the standard library, but also once again in the introduction to these functions:

C99 §7.21.1 p2 / C11 §7.24.1 p2

Where an argument declared as size_t n specifies the length of the array for a function, n can have the value zero on a call to that function. Unless explicitly stated otherwise in the description of a particular function in this subclause, pointer arguments on such a call shall still have valid values as described in 7.1.4.


Is it intentional?

I don't know whether the UB of %p with a null pointer is in fact intentional, but since the standard explicitly states that null pointers are considered invalid values as arguments to standard library functions, and then it goes and explicitly specifies the cases where a null pointer is a valid argument (snprintf, free, etc), and then it goes and once again repeats the requirement for the arguments to be valid even in zero 'n' cases (memcpy, memmove, memset), then I think it's reasonable to assume that the C standards committee isn't too concerned with having such things undefined.



回答3:

The authors of the C Standard made no effort to exhaustively list all of the behavioral requirements an implementation must meet to be suitable for any particular purpose. Instead, they expected that people writing compilers would exercise a certain amount of common sense whether the Standard requires it or not.

The question of whether something invokes UB is seldom in and of itself useful. The real questions of importance are:

  1. Should someone who is trying to write a quality compiler make it behave in predictable fashion? For the described scenario the answer is clearly yes.

  2. Should programmers be entitled to expect that quality compilers for anything resembling normal platforms will behave in predictable fashion? In the described scenario, I would say the answer is yes.

  3. Might some obtuse compiler writers stretch the interpretation of the Standard so as to justify doing something weird? I would hope not, but wouldn't rule it out.

  4. Should sanitizing compilers squawk about the behavior? That would depend upon the paranoia level of their users; a sanitizing compiler probably shouldn't default to squawking about such behavior, but perhaps provide a configuration option to do in case programs might be ported to "clever"/dumb compilers that behave weirdly.

If a reasonable interpretation of the Standard would imply a behavior is defined, but some compiler writers stretch the interpretation to justify doing otherwise, does it really matter what the Standard says?