Are memory leaks “undefined behavior” class proble

2020-02-06 18:27发布

站内文章 / C++

26 0

做个烂人

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Turns out many innocently looking things are undefined behavior in C++. For example, once a non-null pointer has been delete'd even printing out that pointer value is undefined behavior.

Now memory leaks are definitely bad. But what class situation are they - defined, undefined or what other class of behavior?

回答1:

Memory leaks.

There is no undefined behavior. It is perfectly legal to leak memory.

Undefined behavior: is actions the standard specifically does not want to define and leaves upto the implementation so that it is flexible to perform certain types of optimizations without breaking the standard.

Memory management is well defined.
If you dynamically allocate memory and don't release it. Then the memory remains the property of the application to manage as it sees fit. The fact that you have lost all references to that portion of memory is neither here nor there.

Of course if you continue to leak then you will eventually run out of available memory and the application will start to throw bad_alloc exceptions. But that is another issue.

回答2:

Memory leaks are definitely defined in C/C++.

If I do:

int *a = new int[10];

followed by

a = new int[10];

I'm definitely leaking memory as there is no way to access the 1st allocated array and this memory is not automatically freed as GC is not supported.

But the consequences of this leak are unpredictable and will vary from application to application and from machine to machine for a same given application. Say an application that crashes out due to leaking on one machine might work just fine on another machine with more RAM. Also for a given application on a given machine the crash due to leak can appear at different times during the run.

回答3:

If you leak memory, execution proceeds as if nothing happens. This is defined behavior.

Down the track, you may find that a call to malloc fails due to there not being enough available memory. But this is a defined behavior of malloc, and the consequences are also well-defined: the malloc call returns NULL.

Now this may cause a program that doesn't check the result of malloc to fail with a segmentation violation. But that undefined behavior is (from the POV of the language specs) due to the program dereferencing an invalid pointer, not the earlier memory leak or the failed malloc call.

回答4:

My interpretation of this statement:

For an object of a class type with a non-trivial destructor, the program is not required to call the destructor explicitly before the storage which the object occupies is reused or released; however, if there is no explicit call to the destructor or if a delete-expression (5.3.5) is not used to release the storage, the destructor shall not be implicitly called and any program that depends on the side effects produced by the destructor has undefined behavior.

is as follows:

If you somehow manage to free the storage which the object occupies without calling the destructor on the object that occupied the memory, UB is the consequence, if the destructor is non-trivial and has side-effects.

If new allocates with malloc, the raw storage could be released with free(), the destructor would not run, and UB would result. Or if a pointer is cast to an unrelated type and deleted, the memory is freed, but the wrong destructor runs, UB.

This is not the same as an omitted delete, where the underlying memory is not freed. Omitting delete is not UB.

回答5:

(Comment below "Heads-up: this answer has been moved here from Does a memory leak cause undefined behaviour?" - you'll probably have to read that question to get proper background for this answer O_o).

It seems to me that this part of the Standard explicitly permits:

having a custom memory pool that you placement-new objects into, then release/reuse the whole thing without spending time calling their destructors, as long as you don't depend on side-effects of the object destructors.
libraries that allocate a bit of memory and never release it, probably because their functions/objects could be used by destructors of static objects and registered on-exit handlers, and it's not worth buying into the whole orchestrated-order-of-destruction or transient "phoenix"-like rebirth each time those accesses happen.

I can't understand why the Standard chooses to leave the behaviour undefined when there are dependencies on side effects - rather than simply say those side effects won't have happened and let the program have defined or undefined behaviour as you'd normally expect given that premise.

We can still consider what the Standard says is undefined behaviour. The crucial part is:

"depends on the side effects produced by the destructor has undefined behavior."

The Standard §1.9/12 explicitly defines side effects as follows (the italics below are the Standards, indicating the introduction of a formal definition):

Accessing an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.

In your program, there's no dependency so no undefined behaviour.

One example of dependency arguably matching the scenario in §3.8 p4, where the need for or cause of undefined behaviour isn't apparent, is:

struct X
{
    ~X() { std::cout << "bye!\n"; }
};

int main()
{
     new X();
}

An issue people are debating is whether the X object above would be considered released for the purposes of 3.8 p4, given it's probably only released to the O.S. after program termination - it's not clear from reading the Standard whether that stage of a process's "lifetime" is in scope for the Standard's behavioural requirements (my quick search of the Standard didn't clarify this). I'd personally hazard that 3.8p4 applies here, partly because as long as it's ambiguous enough to be argued a compiler writer may feel entitled to allow undefined behaviour in this scenario, but even if the above code doesn't constitute release the scenario's easily amended ala...

int main()
{
     X* p = new X();
     *(char*)p = 'x';   // token memory reuse...
}

Anyway, however main's implemented the destructor above has a side effect - per "calling a library I/O function"; further, the program's observable behaviour arguably "depends on" it in the sense that buffers that would be affected by the destructor were it to have run are flushed during termination. But is "depends on the side effects" only meant to allude to situations where the program would clearly have undefined behaviour if the destructor didn't run? I'd err on the side of the former, particularly as the latter case wouldn't need a dedicated paragraph in the Standard to document that the behaviour is undefined. Here's an example with obviously-undefined behaviour:

int* p_;

struct X
{
    ~X() { if (b_) p_ = 0; else delete p_; }
    bool b_;
};

X x{true};

int main()
{
     p_ = new int();
     delete p_; // p_ now holds freed pointer
     new (&x){false};  // reuse x without calling destructor
}

When x's destructor is called during termination, b_ will be false and ~X() will therefore delete p_ for an already-freed pointer, creating undefined behaviour. If x.~X(); had been called before reuse, p_ would have been set to 0 and deletion would have been safe. In that sense, the program's correct behaviour could be said to depend on the destructor, and the behaviour is clearly undefined, but have we just crafted a program that matches 3.8p4's described behaviour in its own right, rather than having the behaviour be a consequence of 3.8p4...?

More sophisticated scenarios with issues - too long to provide code for - might include e.g. a weird C++ library with reference counters inside file stream objects that had to hit 0 to trigger some processing such as flushing I/O or joining of background threads etc. - where failure to do those things risked not only failing to perform output explicitly requested by the destructor, but also failing to output other buffered output from the stream, or on some OS with a transactional filesystem might result in a rollback of earlier I/O - such issues could change observable program behaviour or even leave the program hung.

Note: it's not necessary to prove that there's any actual code that behaves strangely on any existing compiler/system; the Standard clearly reserves the right for compilers to have undefined behaviour... that's all that matters. This is not something you can reason about and choose to ignore the Standard - it may be that C++14 or some other revision changes this stipulation, but as long as it's there then if there's even arguably some "dependency" on side effects then there's the potential for undefined behaviour (which of course is itself allowed to be defined by a particular compiler/implementation, so it doesn't automatically mean that every compiler is obliged do something bizarre).

回答6:

The language specification says nothing about "memory leaks". From the language point of view, when you create an object in dynamic memory, you are doing just that: you are creating an anonymous object with unlimited lifetime/storage duration. "Unlimited" in this case means that the object can only end its lifetime/storage duration when you explicitly deallocate it, but otherwise it continues to live forever (as long as the program runs).

Now, we usually consider a dynamically allocated object become a "memory leak" at the point in program execution when all references (generic "references", like pointers) to that object are lost to the point of being unrecoverable. Note, that even to a human the notion of "all references being lost" is not very precisely defined. What if we have a reference to some part of the object, which can be theoretically "recalculated" to a reference to the entire object? Is it a memory leak or not? What if we have no references to the object whatsoever, but somehow we can calculate such a reference using some other information available to the program (like precise sequence of allocations)?

The language specification doesn't concern itself with issues like that. Whatever you consider an appearance of "memory leak" in your program, from the language point of view it is a non-event at all. From the language point of view a "leaked" dynamically allocated object just continues to live happily until the program ends. This is the only remaining point of concern: what happens when program ends and some dynamic memory is still allocated?

If I remember correctly, the language does not specify what happens to dynamic memory which is still allocated the moment of program termination. No attempts will be made to automatically destruct/deallocate the objects you created in dynamic memory. But there's no formal undefined behavior in cases like that.

回答7:

The burden of evidence is on those who would think a memory leak could be C++ UB.

Naturally no evidence has been presented.

In short for anyone harboring any doubt this question can never be clearly resolved, except by very credibly threatening the committee with e.g. loud Justin Bieber music, so that they add a C++14 statement that clarifies that it's not UB.

At issue is C++11 §3.8/4:

” For an object of a class type with a non-trivial destructor, the program is not required to call the destructor explicitly before the storage which the object occupies is reused or released; however, if there is no explicit call to the destructor or if a delete-expression (5.3.5) is not used to release the storage, the destructor shall not be implicitly called and any program that depends on the side effects produced by the destructor has undefined behavior.

This passage had the exact same wording in C++98 and C++03. What does it mean?

the program is not required to call the destructor explicitly before the storage which the object occupies is reused or released

– means that one can grab the memory of a variable and reuse that memory, without first destroying the existing object.
if there is no explicit call to the destructor or if a delete-expression (5.3.5) is not used to release the storage, the destructor shall not be implicitly called

– means if one does not destroy the existing object before the memory reuse, then if the object is such that its destructor is automatically called (e.g. a local automatic variable) then the program has Undefined Behavior, because that destructor would then operate on a no longer existing object.
and any program that depends on the side effects produced by the destructor has undefined behavior

– can't mean literally what it says, because a program always depends on any side effects, by the definition of side effect. Or in other words, there is no way for the program not to depend on the side effects, because then they would not be side effects.

Most likely what was intended was not what finally made its way into C++98, so that what we have at hand is a defect.

From the context one can guess that if a program relies on the automatic destruction of an object of statically known type T, where the memory has been reused to create an object or objects that is not a T object, then that's Undefined Behavior.

Those who have followed the commentary may notice that the above explanation of the word “shall” is not the meaning that I assumed earlier. As I see it now, the “shall” is not a requirement on the implementation, what it's allowed to do. It's a requirement on the program, what the code is allowed to do.

Thus, this is formally UB:

auto main() -> int
{
    string s( 666, '#' );
    new( &s ) string( 42, '-' );    //  <- Storage reuse.
    cout << s << endl;
    //  <- Formal UB, because original destructor implicitly invoked.
}

But this is OK with a literal interpretation:

auto main() -> int
{
    string s( 666, '#' );
    s.~string();
    new( &s ) string( 42, '-' );    //  <- Storage reuse.
    cout << s << endl;
    //  OK, because of the explicit destruction of the original object.
}

A main problem is that with a literal interpretation of the standard's paragraph above it would still be formally OK if the placement new created an object of a different type there, just because of the explicit destruction of the original. But it would not be very in-practice OK in that case. Maybe this is covered by some other paragraph in the standard, so that it is also formally UB.

And this is also OK, using the placement new from <new>:

auto main() -> int
{
    char* storage   = new char[sizeof( string )];
    new( storage ) string( 666, '#' );
    string const& s = *(
        new( storage ) string( 42, '-' )    //  <- Storage reuse.
        );
    cout << s << endl;
    //  OK, because no implicit call of original object's destructor.
}

As I see it – now.

回答8:

Its definately defined behaviour.

Consider a case the server is running and keep allocating heap memory and no memory is released even if there's no use of it. Hence the end result would be that eventually server will run out of memory and definately crash will occur.

回答9:

Adding to all the other answers, some entirely different approach. Looking at memory allocation in § 5.3.4-18 we can see:

If any part of the object initialization described above⁷⁶ terminates by throwing an exception and a suitable deallocation function can be found, the deallocation function is called to free the memory in which the object was being constructed, after which the exception continues to propagate in the context of the new-expression. If no unambiguous matching deallocation function can be found, propagating the exception does not cause the object’s memory to be freed. [ Note: This is appropriate when the called allocation function does not allocate memory; otherwise, it is likely to result in a memory leak. —end note ]

Would it cause UB here, it would be mentioned, so it is "just a memory leak".

In places like §20.6.4-10, a possible garbage collector and leak detector is mentioned. A lot of thought has been put into the concept of safely derived pointers et.al. to be able to use C++ with a garbage collector (C.2.10 "Minimal support for garbage-collected regions").

Thus if it would be UB to just lose the last pointer to some object, all the effort would make no sense.

Regarding the "when the destructor has side effects not running it ever UB" I would say this is wrong, otherwise facilities such as std::quick_exit() would be inherently UB too.

回答10:

If the space shuttle must take off in two minutes, and I have a choice between putting it up with code that leaks memory and code that has undefined behavior, I'm putting in the code that leaks memory.

But most of us aren't usually in such a situation, and if we are, it's probably by a failure further up the line. Perhaps I'm wrong, but I'm reading this question as, "Which sin will get me into hell faster?"

Probably the undefined behavior, but in reality both.

回答11:

defined, since a memory leak is you forgetting to clean up after yourself.

of course, a memory leak can probably cause undefined behaviour later.

回答12:

Straight forward answer: The standard doesn't define what happens when you leak memory, thus it is "undefined". It's implicitly undefined though, which is less interesting than the explicitly undefined things in the standard.

回答13:

This obviously cannot be undefined behaviour. Simply because UB has to happen at some point in time, and forgetting to release memory or call a destructor does not happen at any point in time. What happens is just that the program terminates without ever having released memory or called the destructor; this does not make the behaviour of the program, or of its termination, undefined in any way.

This being said, in my opinion the standard is contradicting itself in this passage. On one hand it ensures that the destructor will not be called in this scenario, and on the other hand it says that if the program depends on the side effects produced by the destructor then it has undefined behaviour. Suppose the destructor calls exit, then no program that does anything can pretend to be independent of that, because the side effect of calling the destructor would prevent it from doing what it would otherwise do; but the text also assures that the destructor will not be called so that the program can go on with doing its stuff undisturbed. I think the only reasonable way to read the end of this passage is that if the proper behaviour of the program would require the destructor to be called, then behaviour is in fact not defined; this then is a superfluous remark, given that it has just been stipulated that the destructor will not be called.

回答14:

Undefined behavior means, what will happen has not been defined or is unknown. The behavior of memory leaks is definitly known in C/C++ to eat away at available memory. The resulting problems, however, can not always be defined and vary as described by gameover.