Observable behavior and undefined behavior — What

2019-01-11 12:52发布

Note: I've seen similar questions, but none of the answers are precise enough, so I'm asking this myself.

This is a very nitpicky "language-lawyer" question; I'm looking for an authoritative answer.

The C++ standard says:

A program may end the lifetime of any object by reusing the storage which the object occupies or by explicitly calling the destructor for an object of a class type with a non-trivial destructor. For an object of a class type with a non-trivial destructor, the program is not required to call the destructor explicitly before the storage which the object occupies is reused or released; however, if there is no explicit call to the destructor or if a delete-expression is not used to release the storage, the destructor shall not be implicitly called and any program that depends on the side effects produced by the destructor has undefined behavior.

I simply do not understand what "depends on the side effects" means.

The general question is:

Is forgetting to call a destructor any different than forgetting to call an ordinary function with the same body?

A specific example to illustrate my point is:

Consider a program like this below. Also consider the obvious variations (e.g. what if I don't construct an object on top of another one but I still forget to call the destructor, what if I don't print the output to observe it, etc.):

#include <math.h>
#include <stdio.h>

struct MakeRandom
{
    int *p;
    MakeRandom(int *p) : p(p) { *p = rand(); }
    ~MakeRandom() { *p ^= rand(); }
};

int main()
{
    srand((unsigned) time(NULL));        // Set a random seed... not so important
    // In C++11 we could use std::random_xyz instead, that's not the point

    int x = 0;
    MakeRandom *r = new MakeRandom(&x);  // Oops, forgot to call the destructor
    new (r) MakeRandom(&x);              // Heck, I'll make another object on top
    r->~MakeRandom();                    // I'll remember to destroy this one!
    printf("%d", x);                     // ... so is this undefined behavior!?!
    // If it's indeed UB: now what if I didn't print anything?
}

It seems ridiculous to me to say this exhibits "undefined behavior", because x is already random -- and therefore XORing it another random number cannot really make the program more "undefined" than before, can it?

Furthermore, at what point is it correct to say the program "depends" on the destructor? Does it do so if the value was random -- or in general, if there is no way for me to distinguish the destructor from running vs. not running? What if I never read the value? Basically:

Under which condition(s), if any, does this program exhibit Undefined Behavior?

Exactly which expression(s) or statement(s) cause this, and why?

12条回答
相关推荐>>
2楼-- · 2019-01-11 13:21

Say you have a class that acquires a lock in its constructor and then releases the lock in its destructor. Releasing the lock is a side affect of calling the destructor.

Now, it's your job to ensure that the destructor is called. Typically this is done by calling delete, but you can also call it directly, and this is usually done if you've allocated an object using placement new.

In your example you've allocate 2 MakeRandom instances, but only called the destructor on one of them. If it were were managing some resource (like a file ) then you'd have a resource leak.

So, to answer your question, yes, forgetting to call a destructor is different to forgetting to call an ordinary function. A destructor is the inverse of the constructor. You're required to call the constructor, and so you're required to call the destructor in order to "unwind" anything done by the destructor. This isn't the case with an "ordinary" function.

查看更多
可以哭但决不认输i
3楼-- · 2019-01-11 13:25

Whether a program "depends on the side effects produced by a destructor" hinges on the definition of "observable behavior".

To quote the standard (section 1.9.8, Program execution, bold face is added):

The least requirements on a conforming implementation are:

  • Access to volatile objects are evaluated strictly according to the rules of the abstract machine.
  • At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
  • The input and output dynamics of interactive devices shall take place in such a fashion that prompting output is actually delivered before a program waits for input. What constitutes an interactive device is implementation-defined.

These collectively are referred to as the observable behavior of the program. [ Note: More stringent correspondences between abstract and actual semantics may be defined by each implementation. ]

As for your other question:

Is forgetting to call a destructor any different than forgetting to call an ordinary function with the same body?

Yes! Forgetting an "equivalent" call to a function leads to well defined behavior (whatever it was supposed to make happen doesn't happen), but it's quite different for a destructor. In essence, the the standard is saying that if you engineer your program such that an observable destructor is "forgotten," then you're no longer writing C++, and your program result is completely undefined.

Edit: Oh right, the last question:

Under which condition(s), if any, does this program exhibit Undefined Behavior?

I believe printf qualifies as writing to a file, and is therefore observable. Of course rand() is not actually random, but is completely deterministic for any given seed, so the program as written does exhibit undefined behavior (that said, I would be really surprised if it didn't operate exactly as written, it just doesn't have to).

查看更多
该账号已被封号
4楼-- · 2019-01-11 13:26

The standard is required to speak in such terms as observable behavior and side effects because, although many people often forget this, c++ is not just used for PC software.

Consider the example in your comment to Gene's answer:

class S { 
    unsigned char x; 
    public: ~S() { 
        ++x; 
    } 
};

the destructor here is clearly modifying an object -- hence that's a "side effect" with the given definition -- yet I'm pretty sure no program could "depend" on this side effect in any reasonable sense of the term. What am I missing?

you are missing the embedded world for example. Consider a bare metal c++ program running on a small processor with special function register access to a uart:

new (address_of_uart_tx_special_function_register) S;

here calling the destructor clearly has observable side effects. If we don't call it, the UART transmits one byte less.

Therefore whether side effects are observable also depends on what the hardware is doing with the writes to certain memory locations.

It may also be noteworthy that even if the body of a destructor is empty it could still have side effects if any of the classes member variables have destructors with side effects.

I don't see anything forbidding the compiler from doing other bookkeeping (maybe with regard to exceptions and stack unwinding). Even if no compiler currently does and no compiler ever will from a language lawyer point of view you still have to consider it UB unless you know that the compiler doesn't create side effects.

查看更多
淡お忘
5楼-- · 2019-01-11 13:30

First of all, we need to define undefined behavior, which according to the C FAQ would be when:

Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.

Which, in other words, means that the programmer cannot predict what would happen once the program is executed. This doesn't mean that the program or OS would crash, it simple means that the program future state would only be know once that it is executed.

So, explained in math notation, if a program is reduced to a function F which makes a transformation from an initial state Is into a final state Fs, given certain initial conditions Ic


F(Is,Ic) -> Fs


And if you evaluate the function (execute the program) n times, given that n-> ∞


F(Is,Ic) -> Fs1, F(Is,Ic) -> Fs2, ..., F(Is,Ic) -> Fsn, n-> ∞


Then:

  • A defined behavior would be given by all the resulting states being the same: Fs1 = Fs2 = ... = Fsn, given that n-> ∞
  • An undefined behavior would be given by the possibility of obtaining different finished states among different executions. Fs1 ≠ Fs2 ≠ ... ≠ Fsn, given that n-> ∞

Notice how I highlight possibility, because undefined behavior is exactly that. There exists a possibility that the program executes as desired, but nothing guarantees that it would do so, or that it wouldn't do it.

Hence, answering your answer:

Is forgetting to call a destructor any different than forgetting to call an ordinary function with the same body?

Given that a destructor is a function that could be called even when you don't explicitly call it, forgetting to call a destructor IS different from forgetting to call an ordinary function, and doing so COULD lead to undefined behavior.

The justification is given by the fact that, when you forget to call an ordinary function you are SURE, ahead of time, that that function won't be called at any point in your program, even when you run your program an infinite number of times.

However, when you forget to call a destructor, and you call your program an infinite number of times, and as is exemplified by this post: https://stackoverflow.com/questions/3179494/under-what-circumstances-are-c-destructors-not-going-to-be-called under certain circumstances, C++ destructors are not called, it means that you can't assure beforehand when the destructor would be called, nor when it wouldn't be. This uncertainty means that you can't assure the same final state, thus leading to UB.

So answering your second question:

Under which condition(s), if any, does this program exhibit Undefined Behavior?

The circumstances would be given by the circumstances when the C++ destructors are not called, given on the link that I referenced.

查看更多
一夜七次
6楼-- · 2019-01-11 13:34

For this answer, I will be using a 2012 C++11 release of the C++ standard, which can be found here (C++ standard), because this is freely available and up to date.

The following three terms used in your question occur as followed:

  1. Destructor - 385 times
  2. Side effect - 71 times
  3. Depends - 41 times

Sadly "depends on the side effect" appears only once, and DEPENDS ON is not an RFC standardized identifier like SHALL, so it's rather hard to pin down what depends means.

Depends on

Let's take an "activist judge" approach, and assume that "depends", "dependency", and "depending" are all used in a similar context in this document, that is, that the language was used to convey a broad idea rather than to convey a legalease concept.

Then we can analyze this portion of page 1194:

17.6.3.2
Effect on original feature: Function swap moved to a different header
Rationale: Remove dependency on for swap.
Effect on original feature: Valid C++ 2003 code that has been compiled expecting swap to be in < algorithm > may have to instead include < utility >.

This portion indicates a strict sort of dependency; you originally needed to include to get std::swap. "depends on" therefore indicated a strict requirement, a necessity so to speak, in the sense that there is not sufficient context without the requirement to proceed; failure will occur without the dependency.

I chose this passage because it conveys the intended meaning as clearly as possible; other passages are more verbose, but they all include a similar meaning: necessity.

Therefore, a "depends on" relationship means that the thing being depended on is required for the depending item to make sense, be whole and complete, and be usable in a context.

To cut through that legalese red tape, this means A depends on B means A requires B. This is basically what you'd understand "depend" to mean if you looked it up in a dictionary or spoke it in a sentence.

Side effect

This is more strictly defined, on page 10:

Accessing an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.

This means that anything which results in a change to the environment (such as RAM, network IO, variables, etc etc) are side effects. This neatly fits with the notion of impurity/purity from functional languages, which is clearly what was intended. Note that the C++ standard does not require that such side effects be observable; modifying a variable in any way, even if that variable is never looked at, is still a side effect.

However, due to the "as if" rule, such unobservable side effects may be removed, page 8:

A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).

Depends on the side effects

Putting these two definitions together, we can now define this phrase: something depends on the side effects when those changes to the execution environment are required in order to satisfy the senseful, whole, and complete operations of the program. If, without the side effects, some constraint is not satisfied that is required for the program to operate in a standard compliant way, we can say that it depends on the side effects.

A simple example to illustrate this would be, as stated in another answer, a lock. A program that uses locks depends on the side effect of the lock, notably, the side effect of providing a serialized access pattern to some resource (simplified). If this side effect is violated, the constraints of the program are violated, and thus the program cannot be thought of as senseful (since race conditions or other hazards may occur).

The program DEPENDS on the constraints that a lock provides, via side effects; violating those results in a program that is invalid.

Depends on the side effects produced by the destructor

Changing the language from referring to a lock to a destructor is simple and obvious; if the destructor has side effects which satisfy some constraint that is required by the program to be senseful, whole, complete, and usable, then it depends on the side effects produced by the destructor. This is not exactly difficult to understand, and follows quite readily from both a legalese interpretation of the standard and a cursory layman understanding of the words and how they are used.

Now we can get into answering your questions:

Under which condition(s), if any, does this program exhibit Undefined Behavior?

Any time a dependency or requirement is not fulfilled because a destructor is not called, the behavior of any dependent code is undefined. But what does this really mean?

1.3.24 undefined behavior
behavior for which this International Standard imposes no requirements

[ Note: Undefined behavior may be expected when this International Standard omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data.

Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed. — end note ]

Let's suppose for a moment that such behavior WAS defined.

Suppose it was explicitly illegal. This would then require any standard compiler to detect this case, to diagnose it, to deal with it in some fashion. For example, any object not explicitly deleted would have to be deleted at program exit, requiring some sort of tracking mechanism and ability to issue destructors to arbitrary types, possibly not known at compile time. This is basically a garbage collector, but given it's possibly hide pointers, it's possible to call malloc, etc etc, it would be essentially infeasible to require this.

Suppose it was explicitly allowed. This would also allow compilers to remove destructor calls, under the as-if rule, since hey, you can't depend on that behavior anyway. This would result in some nasty surprises, mostly related to memory not freeing very quickly or easily. To get around that, we'd all start using finalizers, and the problem arises yet again. Furthermore, allowing that behavior means that no library can be sure when their memory is recovered or if it ever will be, or if their locks, OS dependent resources, etc etc, will ever get returned. This pushes the requirements for clean up from the code using the resources to the code providing it, where it's basically impossible to deal with in a language like C or C++.

Suppose it had a specific behavior; what behavior would this be? Any such behavior would have to be quite involved or it wouldn't cover the large number of cases. We've already covered two, and the idea of cleaning up for any given object at program exit imposes a large overhead. For a language meant to be fast or at least minimal, this is clearly an unnecessary burden.

So instead, the behavior was labeled undefined, meaning any implementation is free to provide diagnostics, but also free to simply ignore the problem and leave it to you to figure out. But no matter what, if you depend on those constraints being satisfied but fail to call the destructor, you are getting undefined behavior. Even if the program works perfectly well, that behavior is undefined; it may throw an error message in some new version of Clang, it may delete your hard drive in some incredibly secure cryptographic OS of the far flung future, it may work until the end of time.

But it's still undefined.

Your Example

Your example does not satisfy the "depends on" clause; no constraint that is required for the program to run is unsatisfied.

  1. Constructor requires a well formed pointer to a real variable: satisfied
  2. new requires a properly allocated buffer: satisfied
  3. printf requires an accessible variable, interpretable as an integer: satisfied

No where in this program does a certain value for x or a lack of that value result in a constraint being dissatisfied; you are not invoking undefined behavior. Nothing "depends" on these side effects; if you were to add a test which functioned as a constraint that required a certain value for "x", then it would be undefined behavior.

As it stands, your example is not undefined behavior; it's merely wrong.

Finally!

Is forgetting to call a destructor any different than forgetting to call an ordinary function with the same body?

It is impossible in many cases to define an ordinary function with the same body:

  1. A destructor is a member, not an ordinary function
  2. A function cannot access private or protected values
  3. A function cannot be required to be called upon destruction
  4. A finalizer also cannot be required to be called upon destruction
  5. An ordinary function cannot restore the memory to the OS without calling the destructor

And no, calling free on an allocated object cannot restore the memory; free/malloc need not work on things allocated with new, and without calling the destructor, the private data members will not be released, resulting in a memory leak.

Furthermore, forgetting to call a function will not result in undefined behavior if your program depends on the side effects it imposes; those side effects will simply not be imposed, and your program will not satisfy those constraints, and probably not work as intended. Forgetting to call a destructor, however, results in undefined behavior, as stated on page 66:

For an object of a class type with a non-trivial destructor, the program is not required to call the destructor explicitly before the storage which the object occupies is reused or released; however, if there is no explicit call to the destructor or if a delete-expression (5.3.5) is not used to release the storage, the destructor shall not be implicitly called and any program that depends on the side effects produced by the destructor has undefined behavior.

As you referenced in your original question. I don't see why you had to ask the question, given you already referenced it, but there you go.

查看更多
ら.Afraid
7楼-- · 2019-01-11 13:36

It basically means that when you define your own destructor for a class, it is no longer called automatically upon leaving scope. The object will still be out of scope if you try to use it, but the memory will still be used up in the stack and anything in your non-default destructor will not happen. If you want the count of objects to decrease whenever you call your destructor, for example, it will not happen.

查看更多
登录 后发表回答