Undefined behavior: when attempting to access the

2020-02-12 04:27发布

问题:

The following compiles and prints "string" as an output.

#include <stdio.h>

struct S { int x; char c[7]; };

struct S bar() {
    struct S s = {42, "string"};
    return s;
}

int main()
{
    printf("%s", bar().c);
}

Apparently this seems to invokes an undefined behavior according to

C99 6.5.2.2/5 If an attempt is made to modify the result of a function call or to access it after the next sequence point, the behavior is undefined.

I don't understand where it says about "next sequence point". What's going on here?

回答1:

You've run into a subtle corner of the language.

An expression of array type is, in most contexts, implicitly converted to a pointer to the first element of the array object. The exceptions, none of which apply here, are:

  • When the array expression is the operand of a unary & operator (which yields the address of the entire array);
  • When it's the operand of a unary sizeof or (as of C11) _Alignof operator (sizeof arr yields the size of the array, not the size of a pointer); and
  • When it's a string literal in an initializer used to initialize an array object (char str[6] = "hello"; doesn't convert "hello" to a char*.)

(The N1570 draft incorrectly adds _Alignof to the list of exceptions. In fact, for reasons that are not clear, _Alignof can only be applied to a type name, not to an expression.)

Note that there's an implicit assumption: that the array expression refers to an array object in the first place. In most cases, it does (the simplest case is when the array expression is the name of a declared array object) -- but in this one case, there is no array object.

If a function returns a struct, the struct result is returned by value. In this case, the struct contains an array, giving us an array value with no corresponding array object, at least logically. So the array expression bar().c decays to a pointer to the first element of ... er, um, ... an array object that doesn't exist.

The 2011 ISO C standard addresses this by introducing "temporary lifetime", which applies only to "A non-lvalue expression with structure or union type, where the structure or union contains a member with array type" (N1570 6.2.4p8). Such an object may not be modified, and its lifetime ends at the end of the containing full expression or full declarator.

So as of C2011, your program's behavior is well defined. The printf call gets a pointer to the first element of an array that's part of a struct object with temporary lifetime; that object continues to exist until the printf call finishes.

But as of C99, the behavior is undefined -- not necessarily because of the clause you quote (as far as I can tell, there is no intervening sequence point), but because C99 doesn't define the array object that would be necessary for the printf to work.

If your goal is to get this program to work, rather than to understand why it might fail, you can store the result of the function call in an explicit object:

const struct s result = bar();
printf("%s", result.c);

Now you have a struct object with automatic, rather than temporary, storage duration, so it exists during and after the execution of the printf call.



回答2:

The sequence point occurs at the end of the full expression- i.e., when printf returns in this example. There are other cases where sequence points occur

Effectively, this rule states that function temporaries do not live beyond the next sequence point- which in this case, occurs well after it's use, so your program has quite well-defined behaviour.

Here's a simple example of not well-defined behaviour:

char* c = bar().c; *c = 5; // UB

Here, the sequence point is met after c is created, and the memory it points to is destroyed, but we then attempt to access c, resulting in UB.



回答3:

In C99 there is a sequence point at the call to a function, after the arguments have been evaluated (C99 6.5.2.2/10).

So, when bar().c is evaluated, it results in a pointer to the first element in the char c[7] array in the struct returned by bar(). However, that pointer gets copied into an argument (a nameless argument as it happens) to printf(), and by the time the call is actually made to the printf() function the sequence point mentioned above has occurred, so the member that the pointer was pointing to may no longer be alive.

As Keith Thomson mentions, C11 (and C++) make stronger guarantees about the lifetime of temporaries, so the behavior under those standards would not be undefined.