Observable behavior and undefined behavior — What

2019-01-11 13:30发布

站内文章 / C++

41 0

老娘就宠你

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Note: I've seen similar questions, but none of the answers are precise enough, so I'm asking this myself.

This is a very nitpicky "language-lawyer" question; I'm looking for an authoritative answer.

The C++ standard says:

A program may end the lifetime of any object by reusing the storage which the object occupies or by explicitly calling the destructor for an object of a class type with a non-trivial destructor. For an object of a class type with a non-trivial destructor, the program is not required to call the destructor explicitly before the storage which the object occupies is reused or released; however, if there is no explicit call to the destructor or if a delete-expression is not used to release the storage, the destructor shall not be implicitly called and any program that depends on the side effects produced by the destructor has undeﬁned behavior.

I simply do not understand what "depends on the side effects" means.

The general question is:

Is forgetting to call a destructor any different than forgetting to call an ordinary function with the same body?

A specific example to illustrate my point is:

Consider a program like this below. Also consider the obvious variations (e.g. what if I don't construct an object on top of another one but I still forget to call the destructor, what if I don't print the output to observe it, etc.):

#include <math.h>
#include <stdio.h>

struct MakeRandom
{
    int *p;
    MakeRandom(int *p) : p(p) { *p = rand(); }
    ~MakeRandom() { *p ^= rand(); }
};

int main()
{
    srand((unsigned) time(NULL));        // Set a random seed... not so important
    // In C++11 we could use std::random_xyz instead, that's not the point

    int x = 0;
    MakeRandom *r = new MakeRandom(&x);  // Oops, forgot to call the destructor
    new (r) MakeRandom(&x);              // Heck, I'll make another object on top
    r->~MakeRandom();                    // I'll remember to destroy this one!
    printf("%d", x);                     // ... so is this undefined behavior!?!
    // If it's indeed UB: now what if I didn't print anything?
}

It seems ridiculous to me to say this exhibits "undefined behavior", because x is already random -- and therefore XORing it another random number cannot really make the program more "undefined" than before, can it?

Furthermore, at what point is it correct to say the program "depends" on the destructor? Does it do so if the value was random -- or in general, if there is no way for me to distinguish the destructor from running vs. not running? What if I never read the value? Basically:

Under which condition(s), if any, does this program exhibit Undefined Behavior?

Exactly which expression(s) or statement(s) cause this, and why?

回答1:

I simply do not understand what "depends on the side effects" means.

It means that it depends on something the destructor is doing. In your example, modifying *p or not modifying it. You have that dependency in your code, as the output would differ if the dctor wouldn't get called.

In your current code, the number that is printed, might not be the same number that would have returned by the second rand() call. Your program invokes undefined behavior, but it's just that UB here has no ill effect.

If you wouldn't print the value (or otherwise read it), then there wouldn't be any dependency on the side effects of the dcor, and thus no UB.

So:

Is forgetting to call a destructor any different than forgetting to call an ordinary function with the same body?

Nope, it's not any different in this regard. If you depend on it being called, you must make sure it's called, otherwise your dependency is not satisfied.

Furthermore, at what point is it correct to say the program "depends" on the destructor? Does it do so if the value was random -- or in general, if there is no way for me to distinguish the destructor from running vs. not running?

Random or not doesn't matter, because the code depends on the variable being written to. Just because it's difficult to predict what the new value is doesn't mean there's no dependency.

What if I never read the value?

Then there's no UB, as the code has no dependency on the variable after it was written to.

Under which condition(s), if any, does this program exhibit Undefined Behavior?

There are no conditions. It's always UB.

Exactly which expression(s) or statement(s) cause this, and why?

The expression:

printf("%d", x);

because it introduces the dependency on the affected variable.

回答2:

This makes sense if you accept that the Standard is requiring allocation to be balanced by destruction in the case where destructors affect program behavior. I.e. the only plausible interpretation is that if a program

ever fails to call the destructor (perhaps indirectly through delete) on an object and
said destructor has side-effects,

then the program is doomed to the land of UB. (OTOH, if the destructor doesn't affect program behavior, then you are off the hook. You can skip the call.)

Note added Side effects are discussed in this SO article, and I'll not repeat that here. A conservative inference is that "program ... depends on destructor" is equivalent to "destructor has a side-effect."

Additional note However, the Standard seems to allow for a more liberal interpretation. It does not formally define dependence of a program. (It does define a specific quality of expressions as dependence-carrying, but this does not apply here.) Yet in over 100 uses of derivatives of "A depends on B" and "A has a dependency on B," it employs the conventional sense of the word: a variation in B leads directly to variation in A. Consequently, it does not seem a leap to infer that a program P depends on side effect E to the extent that performance or non-performance of E results in a variation in observable behavior during execution of P. Here we are on solid ground. The meaning of a program - its semantics - is equivalent under the Standard to its observable behavior during execution, and this is clearly defined.

The least requirements on a conforming implementation are:

Access to volatile objects are evaluated strictly according to the rules of the abstract machine.

At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.

The input and output dynamics of interactive devices shall take place in such a fashion that prompting output is actually delivered before a program waits for input. What constitutes an interactive device is implementation-defined.

These collectively are referred to as the observable behavior of the program.

Thus, by the Standard's conventions, if a destructor's side effect would ultimately affect volatile storage access, input, or output, and that destructor is never called, the program has UB.

Put yet another way: If your destructors do significant things and aren't consistently called, your program (says the Standard) ought to be considered, and is hereby declared, useless.

Is this overly restrictive, nay pedantic, for a language standard? (After all, the Standard prevents the side-effect from occurring due to an implicit destructor call and then drubs you if the destructor would have caused a variation in observable behavior if it had been called!) Perhaps so. But it does make sense as a way to insist on well-formed programs.

回答3:

This is indeed not a very well defined thing in the standard, but I would interpret "depends on" as meaning "the behavior under the rules of the abstract machine is affected".

This behavior consists of the sequence of reads and writes to volatile variables and the calls to library I/O functions (which includes at least the I/O functions of the standard library like printf, but may also include any number of additional functions in any given implementation, e.g. WinAPI functions). See 1.9/9 for the exact wording.

So the behavior is undefined if execution of the destructor or lack thereof affects this behavior. In your example, whether the destructor is executed or not affects the value of x, but that store is dead anyway, since the next constructor call overwrites it, so the compiler could actually optimize it away (and chances are, it will). But more importantly, the call to rand() affects the internal state of the RNG, which influences the values returned by rand() in the other object's constructor and destructor, so it does affect the final value of x. It's "random" (pseudo-random) either way, but it would be a different value. Then you print x, turning that modification into observable behavior, thus making the program undefined.

If you never did anything observable with x or the RNG state, the observable behavior would be unchanged independent of whether the destructor is called or not, so it wouldn't be undefined.

回答4:

For this answer, I will be using a 2012 C++11 release of the C++ standard, which can be found here (C++ standard), because this is freely available and up to date.

The following three terms used in your question occur as followed:

Destructor - 385 times
Side effect - 71 times
Depends - 41 times

Sadly "depends on the side effect" appears only once, and DEPENDS ON is not an RFC standardized identifier like SHALL, so it's rather hard to pin down what depends means.

Depends on

Let's take an "activist judge" approach, and assume that "depends", "dependency", and "depending" are all used in a similar context in this document, that is, that the language was used to convey a broad idea rather than to convey a legalease concept.

Then we can analyze this portion of page 1194:

17.6.3.2
Eﬀect on original feature: Function swap moved to a diﬀerent header
Rationale: Remove dependency on for swap.
Eﬀect on original feature: Valid C++ 2003 code that has been compiled expecting swap to be in < algorithm > may have to instead include < utility >.

This portion indicates a strict sort of dependency; you originally needed to include to get std::swap. "depends on" therefore indicated a strict requirement, a necessity so to speak, in the sense that there is not sufficient context without the requirement to proceed; failure will occur without the dependency.

I chose this passage because it conveys the intended meaning as clearly as possible; other passages are more verbose, but they all include a similar meaning: necessity.

Therefore, a "depends on" relationship means that the thing being depended on is required for the depending item to make sense, be whole and complete, and be usable in a context.

To cut through that legalese red tape, this means A depends on B means A requires B. This is basically what you'd understand "depend" to mean if you looked it up in a dictionary or spoke it in a sentence.

Side effect

This is more strictly defined, on page 10:

Accessing an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side eﬀects, which are changes in the state of the execution environment.

This means that anything which results in a change to the environment (such as RAM, network IO, variables, etc etc) are side effects. This neatly fits with the notion of impurity/purity from functional languages, which is clearly what was intended. Note that the C++ standard does not require that such side effects be observable; modifying a variable in any way, even if that variable is never looked at, is still a side effect.

However, due to the "as if" rule, such unobservable side effects may be removed, page 8:

A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undeﬁned operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the ﬁrst undeﬁned operation).

Depends on the side effects

Putting these two definitions together, we can now define this phrase: something depends on the side effects when those changes to the execution environment are required in order to satisfy the senseful, whole, and complete operations of the program. If, without the side effects, some constraint is not satisfied that is required for the program to operate in a standard compliant way, we can say that it depends on the side effects.

A simple example to illustrate this would be, as stated in another answer, a lock. A program that uses locks depends on the side effect of the lock, notably, the side effect of providing a serialized access pattern to some resource (simplified). If this side effect is violated, the constraints of the program are violated, and thus the program cannot be thought of as senseful (since race conditions or other hazards may occur).

The program DEPENDS on the constraints that a lock provides, via side effects; violating those results in a program that is invalid.

Depends on the side effects produced by the destructor

Changing the language from referring to a lock to a destructor is simple and obvious; if the destructor has side effects which satisfy some constraint that is required by the program to be senseful, whole, complete, and usable, then it depends on the side effects produced by the destructor. This is not exactly difficult to understand, and follows quite readily from both a legalese interpretation of the standard and a cursory layman understanding of the words and how they are used.

Now we can get into answering your questions:

Under which condition(s), if any, does this program exhibit Undefined Behavior?

Any time a dependency or requirement is not fulfilled because a destructor is not called, the behavior of any dependent code is undefined. But what does this really mean?

1.3.24 undeﬁned behavior
behavior for which this International Standard imposes no requirements

[ Note: Undeﬁned behavior may be expected when this International Standard omits any explicit deﬁnition of behavior or when a program uses an erroneous construct or erroneous data.

Permissible undeﬁned behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

Many erroneous program constructs do not engender undeﬁned behavior; they are required to be diagnosed. — end note ]

Let's suppose for a moment that such behavior WAS defined.

Suppose it was explicitly illegal. This would then require any standard compiler to detect this case, to diagnose it, to deal with it in some fashion. For example, any object not explicitly deleted would have to be deleted at program exit, requiring some sort of tracking mechanism and ability to issue destructors to arbitrary types, possibly not known at compile time. This is basically a garbage collector, but given it's possibly hide pointers, it's possible to call malloc, etc etc, it would be essentially infeasible to require this.

Suppose it was explicitly allowed. This would also allow compilers to remove destructor calls, under the as-if rule, since hey, you can't depend on that behavior anyway. This would result in some nasty surprises, mostly related to memory not freeing very quickly or easily. To get around that, we'd all start using finalizers, and the problem arises yet again. Furthermore, allowing that behavior means that no library can be sure when their memory is recovered or if it ever will be, or if their locks, OS dependent resources, etc etc, will ever get returned. This pushes the requirements for clean up from the code using the resources to the code providing it, where it's basically impossible to deal with in a language like C or C++.

Suppose it had a specific behavior; what behavior would this be? Any such behavior would have to be quite involved or it wouldn't cover the large number of cases. We've already covered two, and the idea of cleaning up for any given object at program exit imposes a large overhead. For a language meant to be fast or at least minimal, this is clearly an unnecessary burden.

So instead, the behavior was labeled undefined, meaning any implementation is free to provide diagnostics, but also free to simply ignore the problem and leave it to you to figure out. But no matter what, if you depend on those constraints being satisfied but fail to call the destructor, you are getting undefined behavior. Even if the program works perfectly well, that behavior is undefined; it may throw an error message in some new version of Clang, it may delete your hard drive in some incredibly secure cryptographic OS of the far flung future, it may work until the end of time.

But it's still undefined.

Your Example

Your example does not satisfy the "depends on" clause; no constraint that is required for the program to run is unsatisfied.

Constructor requires a well formed pointer to a real variable: satisfied
new requires a properly allocated buffer: satisfied
printf requires an accessible variable, interpretable as an integer: satisfied

No where in this program does a certain value for x or a lack of that value result in a constraint being dissatisfied; you are not invoking undefined behavior. Nothing "depends" on these side effects; if you were to add a test which functioned as a constraint that required a certain value for "x", then it would be undefined behavior.

As it stands, your example is not undefined behavior; it's merely wrong.

Finally!

Is forgetting to call a destructor any different than forgetting to call an ordinary function with the same body?

It is impossible in many cases to define an ordinary function with the same body:

A destructor is a member, not an ordinary function
A function cannot access private or protected values
A function cannot be required to be called upon destruction
A finalizer also cannot be required to be called upon destruction
An ordinary function cannot restore the memory to the OS without calling the destructor

And no, calling free on an allocated object cannot restore the memory; free/malloc need not work on things allocated with new, and without calling the destructor, the private data members will not be released, resulting in a memory leak.

Furthermore, forgetting to call a function will not result in undefined behavior if your program depends on the side effects it imposes; those side effects will simply not be imposed, and your program will not satisfy those constraints, and probably not work as intended. Forgetting to call a destructor, however, results in undefined behavior, as stated on page 66:

For an object of a class type with a non-trivial destructor, the program is not required to call the destructor explicitly before the storage which the object occupies is reused or released; however, if there is no explicit call to the destructor or if a delete-expression (5.3.5) is not used to release the storage, the destructor shall not be implicitly called and any program that depends on the side eﬀects produced by the destructor has undeﬁned behavior.

As you referenced in your original question. I don't see why you had to ask the question, given you already referenced it, but there you go.

回答5:

In the comments you've left a simple question that made me rethink what I said. I've removed the old answer because even if it had some value, it was far from the point.

So you're saying my code is well-defined, since it "doesn't depend on that even if I print it"? No undefined behavior here?

Let me say again that I don't precisely remember the definition of placement new operator and deallocation rules. Actually, I've not even read the newest C++ standard in full. But if the text you quoted is from there, then you are hitting the UB.

Not due to Rand or Print. Or anything we "see".

Any UB that occurs here is because your code assumes that you can safely "overwrite" an old 'object' without destroying the previous instance that was sitting at that place. The core sideeffect of a destructor is not "freeing handles/resources" (which you do manually in your code!) but leaving the space "ready for being reclaimed/reused".

You have assumed that the usage of the memory chunks and lifetimes of objects are not well-tracked. I'm pretty sure that the C++ standard does not define that they are untracked.

For example, imagine that you have the same code as provided, but that this struct/class has a vtable. Imagine that you are using hyper-picky compiler which has tons of debugchecks that manages the vtable with extra care and allocates some extra bitflag and that injects code into base constructors and destructors that flips that flag to help to trace errors. On such compiler, this code would crash on the line of new (r) MakeRandom since first object's lifetime has not been terminated. And I'm pretty sure that such picky compiler would still be fully C++ compliant, just as your compiler surely is too.

It's an UB. It's only that most compilers really don't do such checks.

回答6:

First of all, we need to define undefined behavior, which according to the C FAQ would be when:

Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.

Which, in other words, means that the programmer cannot predict what would happen once the program is executed. This doesn't mean that the program or OS would crash, it simple means that the program future state would only be know once that it is executed.

So, explained in math notation, if a program is reduced to a function F which makes a transformation from an initial state Is into a final state Fs, given certain initial conditions Ic

F(Is,Ic) -> Fs

And if you evaluate the function (execute the program) n times, given that n-> ∞

F(Is,Ic) -> Fs1, F(Is,Ic) -> Fs2, ..., F(Is,Ic) -> Fsn, n-> ∞

Then:

A defined behavior would be given by all the resulting states being the same: Fs1 = Fs2 = ... = Fsn, given that n-> ∞
An undefined behavior would be given by the possibility of obtaining different finished states among different executions. Fs1 ≠ Fs2 ≠ ... ≠ Fsn, given that n-> ∞

Notice how I highlight possibility, because undefined behavior is exactly that. There exists a possibility that the program executes as desired, but nothing guarantees that it would do so, or that it wouldn't do it.

Hence, answering your answer:

Is forgetting to call a destructor any different than forgetting to call an ordinary function with the same body?

Given that a destructor is a function that could be called even when you don't explicitly call it, forgetting to call a destructor IS different from forgetting to call an ordinary function, and doing so COULD lead to undefined behavior.

The justification is given by the fact that, when you forget to call an ordinary function you are SURE, ahead of time, that that function won't be called at any point in your program, even when you run your program an infinite number of times.

However, when you forget to call a destructor, and you call your program an infinite number of times, and as is exemplified by this post: https://stackoverflow.com/questions/3179494/under-what-circumstances-are-c-destructors-not-going-to-be-called under certain circumstances, C++ destructors are not called, it means that you can't assure beforehand when the destructor would be called, nor when it wouldn't be. This uncertainty means that you can't assure the same final state, thus leading to UB.

So answering your second question:

Under which condition(s), if any, does this program exhibit Undefined Behavior?

The circumstances would be given by the circumstances when the C++ destructors are not called, given on the link that I referenced.

回答7:

Whether a program "depends on the side effects produced by a destructor" hinges on the definition of "observable behavior".

To quote the standard (section 1.9.8, Program execution, bold face is added):

The least requirements on a conforming implementation are:

Access to volatile objects are evaluated strictly according to the rules of the abstract machine.

At program termination, all data written into ﬁles shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.

The input and output dynamics of interactive devices shall take place in such a fashion that prompting output is actually delivered before a program waits for input. What constitutes an interactive device is implementation-deﬁned.

These collectively are referred to as the observable behavior of the program. [ Note: More stringent correspondences between abstract and actual semantics may be deﬁned by each implementation. ]

As for your other question:

Is forgetting to call a destructor any different than forgetting to call an ordinary function with the same body?

Yes! Forgetting an "equivalent" call to a function leads to well defined behavior (whatever it was supposed to make happen doesn't happen), but it's quite different for a destructor. In essence, the the standard is saying that if you engineer your program such that an observable destructor is "forgotten," then you're no longer writing C++, and your program result is completely undefined.

Edit: Oh right, the last question:

Under which condition(s), if any, does this program exhibit Undefined Behavior?

I believe printf qualifies as writing to a file, and is therefore observable. Of course rand() is not actually random, but is completely deterministic for any given seed, so the program as written does exhibit undefined behavior (that said, I would be really surprised if it didn't operate exactly as written, it just doesn't have to).

回答8:

My reading of this portion of the standard is:

You are allowed to reuse the storage for an object that has a non-trivial destructor without calling that destructor
If you do, the compiler is not allowed to call the destructor for you
If your program has logic that depends on the destructor being called, your program might break.

Side effects here are simply changes in program state that result from calling the destructor. They will be things like updating reference counts, releasing locks, closing handles, stuff like that.

'Depends on the side effects' means that another part of the program expects the reference count to be maintained correctly, locks to be released, handles closed and so on. If you make a practice of not calling destructors, you need to make sure your program logic does not depend on them having been called.

Although 'forgetting' is not really relevant, the answer is no, destructors are just functions. The key difference is that under some circumstances they get called by the compiler ('implicitly') and this section of the standard defines a situation in which they will not.

Your example does not really 'depend on the side effects'. It obviously calls the random function exactly 3 times and prints whatever value it calculates. You could change it so:

The struct maintains a reference count (ctor +1, dtor -1)
A factory function reuses objects and randomly calls the destructor or not
A client function 'depends on' the reference count being maintained correctly, by expecting it to be zero.

Obviously, with this dependency the program would exhibit 'undefined behaviour' with respect to the reference count.

Please note that 'undefined behaviour' does not have to be bad behaviour. It simply means 'behavior for which this International Standard imposes no requirements'.

I really think there is a danger of overthinking what is fundamentally quite a simple concept. I can't quote any authority beyond the words that are here and the standard itself, which I find quite clear (but by all means tell me if I'm missing something).

回答9:

Say you have a class that acquires a lock in its constructor and then releases the lock in its destructor. Releasing the lock is a side affect of calling the destructor.

Now, it's your job to ensure that the destructor is called. Typically this is done by calling delete, but you can also call it directly, and this is usually done if you've allocated an object using placement new.

In your example you've allocate 2 MakeRandom instances, but only called the destructor on one of them. If it were were managing some resource (like a file ) then you'd have a resource leak.

So, to answer your question, yes, forgetting to call a destructor is different to forgetting to call an ordinary function. A destructor is the inverse of the constructor. You're required to call the constructor, and so you're required to call the destructor in order to "unwind" anything done by the destructor. This isn't the case with an "ordinary" function.

回答10:

I have not read everyone else's input, but I have a simple explanation. In the quote

however, if there is no explicit call to the destructor or if a delete-expression is not used to release the storage, the destructor shall not be implicitly called and any program that depends on the side effects produced by the destructor has undeﬁned behavior.

The meaning is very different depending on how you parse it. This meaning is what I hear people talking about.

however, { if there is no explicit call to the destructor or if a delete-expression is not used to release the storage }, the destructor shall not be implicitly called and any program that depends on the side effects produced by the destructor has undeﬁned behavior.

But I think this meaning makes more sense

however, if there is no explicit call to the destructor or { if a delete-expression is not used to release the storage, the destructor shall not be implicitly called and any program that depends on the side effects produced by the destructor has undeﬁned behavior } .

which basically says C++ does not have a garbage collector and if you assume it does have GC your program will not work as you expect.

回答11:

It basically means that when you define your own destructor for a class, it is no longer called automatically upon leaving scope. The object will still be out of scope if you try to use it, but the memory will still be used up in the stack and anything in your non-default destructor will not happen. If you want the count of objects to decrease whenever you call your destructor, for example, it will not happen.

回答12:

The standard is required to speak in such terms as observable behavior and side effects because, although many people often forget this, c++ is not just used for PC software.

Consider the example in your comment to Gene's answer:

class S { 
    unsigned char x; 
    public: ~S() { 
        ++x; 
    } 
};

the destructor here is clearly modifying an object -- hence that's a "side effect" with the given definition -- yet I'm pretty sure no program could "depend" on this side effect in any reasonable sense of the term. What am I missing?

you are missing the embedded world for example. Consider a bare metal c++ program running on a small processor with special function register access to a uart:

new (address_of_uart_tx_special_function_register) S;

here calling the destructor clearly has observable side effects. If we don't call it, the UART transmits one byte less.

Therefore whether side effects are observable also depends on what the hardware is doing with the writes to certain memory locations.

It may also be noteworthy that even if the body of a destructor is empty it could still have side effects if any of the classes member variables have destructors with side effects.

I don't see anything forbidding the compiler from doing other bookkeeping (maybe with regard to exceptions and stack unwinding). Even if no compiler currently does and no compiler ever will from a language lawyer point of view you still have to consider it UB unless you know that the compiler doesn't create side effects.