What happens under the hood of vector::push_back m

2020-02-05 04:06发布

问题:

My question is regarding the effect of vector::push_back, I know it adds an element in the end of the vector but what happens underneath the hood?

IIRC memory objects are allocated in a sequential manner, so my question is whether vector::push_back simply allocates more memory immediately after the vector, and if so what happens if there is not enough free memory in that location? Or perhaps a pointer is added in the "end" to cause the vector to "hop" to the location it continues? Or is it simply reallocated through copying it to another location that has enough space and the old copy gets discarded? Or maybe something else?

回答1:

If there is enough space already allocated, the object is copy constructed from the argument in place. When there is not enough memory, the vector will grow it's internal databuffer following some kind of geometric progression (each time the new size will be k*old_size with k > 1[1]) and all objects present in the original buffer will then be moved to the new buffer. After the operation completes the old buffer will be released to the system.

In the previous sentence move is not used in the technical move-constructor/ move-assignment sense, they could be moved or copied or any equivalent operation.

[1] Growing by a factor k > 1 ensures that the amortized cost of push_back is constant. The actual constant varies from one implementation to another (Dinkumware uses 1.5, gcc uses 2). The amortized cost means that even if every so often one push_back will be highly expensive (O(N) on the size of the vector at the time), those cases happen rarely enough that the cost of all operations over the whole set of insertions is linear in the number of insertions, and thus each insertion averages a constant cost)



回答2:

When vector is out of space, it will use it's allocator to reserve more space.

It is up to the allocator to decide how this is implemented.

However, the vector decides how much space to reserve: the standard guarantees that the vector capacity shall grow by at least a factor of 1.51 geometrically (see comment), thus preventing horrible performance due to repeated 'small' allocations.

On the physical move/copy of elements:

  • c++11 conforming implementations will move elements if they support move assignment and construction
  • most implementations I know of (g++ notably) will just use std::copy for POD types; the algorithm specialisation for POD types ensures that this compiles into (essentially) a memcpy operation. This in turn gets compiled in whatever CPU instruction is fastest on your system (e.g. SSE2 instructions)

1 I tried finding the reference quote for that from the n3242 standard draft document, but I was unable to find it at this time



回答3:

A vector gurantees that all elements are contigious in memory.

Internally you can think of it as defined as three pointers (or what act like pointers):

start:     Points at the beginning of the allocated block.
final:     Points one past the last element in the vector.
           If the vector is empty then start == final 
capacity:  Points one past the end of allocated memory.
           If final == capacity there is no room left.

When you push back.

  1. If final is smaller than capacity:
    • the new element is copied into the location pointed at by final
    • final is incremented to the next location.
  2. If final is the same as capacity then the vector is full
    • new memory must be allocated.
    • The compiler will then allocate X*(capacity - start)*sizeof(t) bytes.
    • where X is usually a value between 1.5 and 2.
    • It then copies all the values from the old memory buffer to the new memory buffer.
    • the new value is added to the buffer.
    • Transfers start/final/capacity pointers.
    • Free's up the old buffer


回答4:

When vector runs out of space, it is reallocated and all the elements are copied over to the new array. The old array is then destroyed.

To avoid an excessive number of allocations and to keep the average push_back() time at O(1), a reallocation requires that the size be increased by at least a constant factor. (1.5 and 2 are common)



回答5:

When you call vector::push_back the end pointer is compared to the capacity pointer. If there is enough room for the new object placement new is called to construct the object in the available space and the end pointer is incremented.

If there isn't enough room the vector calls its allocator to allocate enough contiguous space for at least the existing elements plus new element (different implementation may grow the allocated memory by different multipliers). Then all existing elements plus the new one are copied to the newly allocated space.



回答6:

std::vector overallocates - it will usually allocate more memory than necessary automatically. sizeis not affectd by this, but you can control that through capacity.

std::vector will copy everything if the additional capacity is not sufficient.

The memory allocated by std::vector is raw, no constructors are called on demand, using placement new.

So, push_back does:

  • if capacity is not sufficient for the new element, it will
    • allocate a new block
    • copy all existing elements (usually using the copy constructor)
  • increase size by one
  • copy the new element to the new location


回答7:

If you have some idea of what will be the final size of your array, try to vector::reserve the memory first. Note that reserve is different from vector::resize. With reserve the vector::size() of your array is not changed