What happens to the underlying storage upon vector

2019-01-18 13:08发布

问题:

For std::vector's copy assignment, is reallocation of storage and shrink of capacity allowed when the source's size is smaller than the destination's capacity? Or is it guaranteed that the reallocation/shrink will not happen (i.e. always respect previous reserve())?

On the other side, if the source's size is bigger than the destination's capacity and a reallocation takes place, is it required that the reallocation respect the source's capacity (e.g. the destination's new capacity should be no smaller than the source's capacity, or even require them to be the same)? Or the reallocation simply does its job (based on the new size) with no regard to the source's capacity?

As for move assignment, I suppose no storage reallocation will take place (though I failed to locate relevant part in the standard), so does it mean the value of the destination's new capacity will be exactly the same to the source's old capacity? Can I expect v = vector<T>{}; to have the same effect as vector<T>{}.swap(v);?

I suppose the answers are buried somewhere in the standard, but I just failed to find them. (In case things are different for C++11 and C++03, I would like to know various requirements from both.)

PS: For whatever answer of the above questions, is it the same for std::string (only in C++11 which means contiguous storage and no COW, C++03 string is out of the radar)?

回答1:

std::vector<T,A>( std::vector<T,A>&& )

this is guaranteed to be constant time (N3797 Table 99 X u(rv)).

There is no way known to myself to move an arbitrary sized vector in constant time without just moving the pointers to the buffer. If this (there is no way) is true, then the constructed vector must have a buffer at least as larger as the source buffer. There is no wording in the standard that states vectors need be efficient, however: the right hand side vector can have its capacity reduced to any value greater than or equal to its size: the postcondition is only that the elements are the same. In theory it could even be larger capacity, if the right hand side vector had "hidden capacity" that the compiler chose to expose for whatever reason (the phase of the moon, say).

At no point in the N3797 standard is an upper bound on capacity placed on any container. A conforming implementation can have all std::vectors have a minimum 2 million element capacity (barring allocator failure -- which could be used to force a capacity of 0), with no operation capable of reducing that value below 2 million. (shrink_to_fit is just a suggestion, and std::vector<T,A>().swap(x) could create a vector of 2 million capacity and swap it.

As much of the above is of a negative form, all I can say is search the standard for every mention of vector, allocator, allocate, and capacity. capacity is never described with an upper bound at any point. No restrictions (other than exception-safety) are made against extra calls to the allocator in an empty std::vector constructor (if the allocator failed, you might have to stay size 0 and capacity 0, but extracting that state to anything that doesn't use the same allocator is challenging).


As for copy assignment and move assignment, the copy assignment places no guarantees on capacity beyond the most basic (that capacity() >= size()).

For move-assignment, it depends on how this applies:

23.2.1 [container.requirements.general] /10

Unless otherwise specified (either explicitly or by defining a function in terms of other functions), invoking a container member function or passing a container as an argument to a library function shall not invalidate iterators to, or change the values of, objects within that container.

a = rv (aka std::vector<T,A>& operator=(std::vector<T,A>&&)) case from Table 96 and Table 99 are what concern us. Neither mention that the values contained in rv are destroyed, nor that iterators to them are invalidated. As such, under 23.2.1/10 the iterators are not invalidated.

This does not, however, demand that the buffer be moved. Either the buffer is moved from the rhs to the lhs, or it remains intact in the rhs vector. Table 99 mentions that case implicitly when it says that the lhs items can be move-assigned to (which is the only way a std::array can work).

As std::array has no choice but to move elements, and std::vector gives no further guarantees about moving the buffer, the buffer does not have to be moved. It appears that moving the buffer is a legal implementation, however.


In practice, std::vector<T,A>( std::vector<T,A> const& ) will perform a copy of the contents of the right hand side into the left hand side, and in every implementation I have checked the left hand side's capacity is equal to the size of the resulting vector. Similarly, std::vector<T,A>( ForwardIterator, ForwardIterator ) will produce a vector that just-fits its input.

Note that std::vector<T,A>::operator=(std::vector<T,A>&&) remains linear in complexity.



回答2:

I can find nothing in the standard which would allow assigning a vector to one with sufficient capacity to reduce the capacity. If I've done reserve before the assignment, I'm guaranteed that iterators won't be invalidated by a reallocation as long as the vector never becomes larger than the capacity I reserved.

The issue with move assignment is particular. There doesn't seem to be any special case to allow it to invalidate iterators (unless the source is larger than the capacity of the destination), but this sort of defeats the goal of move assignment. I suspect that this is a defect in the standard.

EDIT:

For what it's worth, in table 96, for a = rv (where a is a container, and rv is a non-const r-value of the same type of container), the standard gives linear complexity, and says that "all existing elements a a are either move assigned to or destroyed". So apparently, the intent of the standard is for the capacity not to be reduced; the execution-time benefits of move only apply to the individual elements, not to the container itself.



回答3:

To answer your PS: At least for std::string, you cannot assume that the handling of strings are similar to the handling of a vector of chararacters.

COW (Copy-on-Write) is not mandatory to implement string, and not all library-implementations do this.