Why does std::vector reserve not “double” its capa

2019-02-07 19:46发布

站内文章 / C++

47 0

叛逆

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I just found out that std::vector<T>::resize "doubles" its capacity even when resizing to one element above the current size:

std::vector<int> v(50);
v.resize(51);
std::cout << v.capacity() << std::endl;

This program outputs 100 with GCC and Clang, and 75 with Visual C++. However, when I switch from resize to reserve:

std::vector<int> v(50);
v.reserve(51);
std::cout << v.capacity() << std::endl;

The output is 51 with all three compilers.

I wonder why implementations use a different expansion strategy for resize and reserve. It seems inconsistent, and I would expect the same behavior here.

I am just adding a link to a motivation for my question, where the impact on performance is reported: Why are C++ STL vectors 1000x slower when doing many reserves?

Adding a quote from C++11 Standard to clarify requirements for reserve; §23.3.6.3(2):

After reserve(), capacity() is greater or equal to the argument of reserve if reallocation happens...

Some additional thoughts: From C++11 Standard:

Complexity: The complexity is linear in the number of elements inserted plus the distance to the end of the vector.

Which, effectively, implies constant (amortized) complexity for inserting a single element at the end. However, this applies only for vector modifiers, such as push_back or insert (§23.3.6.5).

resize is not listed among modifiers. It's listed in §23.3.6.3 vector capacity section. And, there are no complexity requirements for resize.

However, in the vector overview section (§23.3.6.1), there is written:

it (vector) supports (amortized) constant time insert and erase operations at the end

The question is whether resize(size()+1) is considered to be "insertion at the end".

回答1:

As far as I can tell, neither resize nor reserve is required to have the demonstrated behaviour. Both are however allowed such behaviour although both could either allocate the exact amount, and both could multiply the previous allocation as far as the standard is concerned.

Each allocation strategies have their advantages. The advantage of allocating exact amount is that it has no memory overhead when the maximum allocation is known beforehand. The advantage of multiplying is that it maintains the constant amortized property when mixed with end-insertion operations.

The approach chosen by the tested implementations has the advantage that it allows both strategies when resizing. To use one strategy, one can reserve and then resize. To use the other, just resize. Of course, one has to be aware of the unspecified behaviour to take advantage of this. This advantage may or might not be the reasoning behind the choice of these implementations.

One might consider it a failure of the vector API, as specified in the standard, that expressing the intended reallocation behaviour is not possible (in a way that is guaranteed by the standard).

回答2:

When you resize more than there is capacity you already "demonstrate" that you don't want to reserve just the right capacity. On the other hand, if you use reserve you explicitly ask for the right capacity. If reserve would use the same strategy as resize there would be no way to reserve just the right amount.

In this sense resize without reserve is for the lazy ones or in case you don't know the exact amount to reserve. You call reserve if you know what capacity you need. That's two different scenarios.

PS: As StoryTeller pointed out, also reserve is not required to reserve the exact amount that is asked for as per the standard. Nevertheless I think my main argument still holds: resize (without reserve) and reserve are meant for different scenarios, where you either give a hint of how much you want to reserve or don't care about the actual capacity and just want to have the container sized to what you ask for.

回答3:

Why would you expect them to behave the same? reserve is used to preallocate space you will use later, with the expectation that the user has a decent handle on the expected final size of the container. resize is simply a normal allocation, and so it follows the normal, speed efficient, approach of geometrically increasing the container's allocated space.

Containers increase in size by multiplicative steps to reduce the number of allocations needed and thus maintain speed and reduce memory fragmentation. Doubling is the most common, but some implementations use steps of 1.5 (e.g. MSVC) which trade increased allocations for lower wasted space within each container.

But, if the user has already told the library how big they think the container will get - by causing reserve - there's no need to allocate excess space, they can instead trust the user to have called it with the correct number. It is reserve that has the unusual behaviour, not resize.

回答4:

resize is required to follow an exponential reallocation strategy to fulfil its complexity guarantee (linear in the number of elements inserted). This can be seen by considering that resize(size() + 1) is required to have amortized constant complexity, so must follow exponential growth for the same reason that push_back (amortized constant complexity) must grow exponentially.

An implementation of reserve is permitted to follow whatever allocation strategy it likes, since its only complexity requirement is that it be linear in the number of elements present. However, if an implementation were e.g. to round up to the next power of two, this would be space-inefficient (and surprising) in the case where the user knows exactly how much memory is required, and could complicate porting if the user comes to rely on this behavior. The latitude in the Standard is better exercised in cases where there is no space inefficiency e.g. by rounding up allocations to the word size, if the allocator operates at that granularity.

回答5:

reserve changes the capacity, while resize changes the size.

capacity is the number of elements that the container has currently allocated space for.

size is the number of elements in the container.

When you allocate an empty vector you get a default capacity (AKA space). The size is still 0, and when you add elements into the vector, its size increment. When size is equal to capacity and you add more item the capacity must grow (usually double itself).

The problem with vector is that it ensures sequential memory, meaning each new allocation growth will need also a copy of the previous allocation to the new one, in case there was no space for the new allocation size in the old allocated memory area.

Here the reserve can help, if you know the max elements in the vector. When you use reserve, there will be only one allocation and no memory copy, unless you pass the reserved items.

When you tell the exact reserved count, you get the exact memory you asked. When you just add elements (even with resize, you don't say you wouldn't add more items.