What is wrong with using arrays dynamically alloca

2019-01-23 01:27发布

站内文章 / C++

42 0

放荡不羁爱自由

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

This question already has an answer here:

Why should C++ programmers minimize use of 'new'? 17 answers
Understanding the meaning of the term and the concept - RAII (Resource Acquisition is Initialization) 11 answers

Like the following code :

int size = myGetSize();
std::string* foo;
foo = new std::string[size];
//...
// using the table
//...
delete[] foo;

I heard that such use (not this code precisely, but dynamic allocation as a whole) can be unsafe in some cases, and should be used only with RAII. Why?

回答1:

I see three main problems with your code:

Use of naked, owning pointers.
Use of naked new.
Use of dynamic arrays.

Each is undesirable for its own reasons. I will try to explain each one in turn.

(1) violates what I like to call subexpression-wise correctness, and (2) violates statement-wise correctness. The idea here is that no statement, and not even any subexpression, should by itself be an error. I take the term "error" loosely to mean "could be a bug".

The idea of writing good code is that if it goes wrong, it wasn't your fault. Your basic mindset should be that of a paranoid coward. Not writing code at all is one way to achieve this, but since that rarely meets requirements, the next best thing is to make sure that whatever you do, it Isn't Your Fault. The only way you can systematically prove that it's not your fault is if no single part of your code is the root cause of an error. Now let's look at the code again:

new std::string[25] is an error, because it creates a dynamically allocated object which is leaked. This code can only conditionally become a non-error if someone else, somewhere else, and in every case, remembers to clean up.

This requires, first of all, that the value of this expression be stored somewhere. This is happening in your case, but in more complex expressions it may be hard to prove that it will ever happen in all cases (unspecified evaluation order, I'm looking at you).
foo = new std::string[125]; is an error because again foo leaks a resource, unless the stars align and someone remembers, in every case and at the right time, to clean up.

The correct way of writing this code so far would be:

std::unique_ptr<std::string[]> foo(std::make_unique<std::string[]>(25));

Note that every single subexpression in this statement is not the root cause of a program bug. It Is Not Your Fault.

Finally, as for (3), dynamic arrays are a misfeature in C++ and should basically never be used. There are several standard defects relating just to dynamic arrays (and not considered worth fixing). The simple argument is that you cannot use arrays without knowing their size. You might say that you could use a sentinel or tombstone value to mark the end of an array dynamically, but that makes the correctness of your program value-dependent, not type-dependent, and thus not statically checkable (the very definition of "unsafe"). You cannot assert statically that It Wasn't Your Fault.

So you end up having to maintain a separate storage for the array size anyway. And guess what, your implementation has to duplicate that knowledge anyway so it can call destructors when you say delete[], so that's wasted duplication. The correct way, instead, is not to use dynamic arrays, but instead separate memory allocation (and make it customizable via allocators why we're at it) from element-wise object construction. Wrapping all this (allocator, storage, element count) into a single, convenient class is the C++ way.

Thus the final version of your code is this:

std::vector<std::string> foo(25);

回答2:

The code you propose is not exception-safe, and the alternative:

std::vector<std::string> foo( 125 );
//  no delete necessary

is. And of course, the vector knows the size later, and can do bounds checking in debug mode; it can be passed (by reference or even by value) to a function, which will then be able to use it, without any additional arguments. Array new follows the C conventions for arrays, and arrays in C are seriously broken.

As far as I can see, there is never a case where an array new is appropriate.

回答3:

I heard that such use (not this code precisely, but dynamic allocation as a whole) can be unsafe in some cases, and should be used only with RAII. Why?

Take this example (similar to yours):

int f()
{
    char *local_buffer = new char[125];
    get_network_data(local_buffer);
    int x = make_computation(local_buffer);
    delete [] local_buffer;
    return x;
}

This is trivial.

Even if you write the code above correctly, somebody may come one year later, and add a conditional, or ten or twenty, in your function:

int f()
{
    char *local_buffer = new char[125];
    get_network_data(local_buffer);
    int x = make_computation(local_buffer);
    if(x == 25)
    {
        delete[] local_buffer;   
        return 2;
    }
    if(x < 0)
    {
        delete[] local_buffer; // oops: duplicated code
        return -x;
    }
    if(x || 4)
    {
        return x/4; // oops: developer forgot to add the delete line
    }
    delete[] local_buffer; // triplicated code
    return x;
}

Now, making sure the code has no memory leaks is more complicated: you have multiple code paths and each of them has to repeat the delete statement (and I introduced a memory leak on purpose, to give you an example).

This is still a trivial case, with only one resource (local_buffer), and it (naively) assumes the code throws no exceptions whatsoever, between the allocation and deallocation. The problem leads to unmaintainable code, when your function allocates ~10 local resources, can throw, and has multiple return paths.

More than that, the progression above (simple, trivial case extended to more complex function with multiple exit paths, extended to multiple resources and so on) is a natural progression of code in the development of most projects. Not using RAII, creates a natural way for developers to update the code, in a way that will decrease quality, over the lifetime of the project (this is called cruft, and is a Very Bad Thing).

TLDR: Using raw pointers in C++ for memory management is a bad practice (altough for implementing an observer role, an implementation with raw pointers, is fine). Resource management with raw poiners violates SRP and DRY principles).

回答4:

There are two major downsides of it -

new does not guarantee that the memory you are allocating is initialized with 0s or null. They will have undefined values unless you initialize them.
Secondly, the memory is dynamically allocated, which means it is hosted in heap not in stack. The difference between heap and stack is that, stacks are cleared when the variable runs out of scope but heaps are not cleared automatically and also C++ does not contain a built in Garbage Collector, which means if any how the delete call is missed you are ended up with a memory leak.

回答5:

the raw pointer is difficult to handle correctly, e.g. wrt. copying of objects.

it's much simpler and safer to use a well-tested abstraction such as std::vector.

in short, don't needlessly reinvent the wheel – others have already created some superb wheels that you're not likely to match in quality or price

回答6:

If the allocated memory is not freed when it's no longer necessary it will result in a memory leak. It is not specified what will happen to the leaked memory, but contemporary operating systems collect it when the program terminates. Memory leaks can be very dangerous because the system may run out of memory.

回答7:

The delete in the end could be skipped. The code shown is not "wrong" in the strictest sense, but C++ offers automatic memory management for variables as soon as their scope is left; using a pointer is not necessary in your example.

回答8:

Have the allocation within a try block and the catch block should deallocate all allocated memory thus far and also on normal exit outside the exception block, and catch block should not fall through the normal execution block to avoid double deletion

回答9:

See JPL Coding standards. Dynamic memory allocation leads to unpredictable execution. I have seen problems from dynamic memory allocations in perfectly coded systems - that over time, there is memory fragmentation just like a hard disk. Allocating blocks of memory from the heap will take longer and longer, until it becomes impossible to allocate the requested size. At which point in time, you start getting NULL pointers returned and the entire program crashes because few if anyone tests for out-of-memory conditions. It is important to note, that by the book, you may have enough memory available, however the fragmentation thereof is what prevents allocation. This is addressed in .NET CLI, with the use of "handles" instead of pointers, where the runtime can garbage collect, using a mark-and-sweep garbage collector, move the memory around. During the sweep it compacts the memory to prevent fragmentation and updates the handles. Whereas pointers (memory addresses) cannot be updated. This is a problem though, because garbage collection is no longer deterministic. Though, .NET has added mechanisms to make it more deterministic. However, if you follow JPL's advice (section 2.5), you do not need a fancy garbage collection. You dynamically allocate all you need at initialization, then reuse the allocated memory, never freeing it, then there is no fragmentation risk and you can still have deterministic garbage collection.