How does std::vector support contiguous memory for

2019-06-25 09:22发布

问题:

I'm struggling with the correct mental model and understanding of std::vector.

What I thought I knew

When you create a vector of type T and then reserve N elements for the vector, the compiler basically finds and reserves a contiguous block of memory that is N * sizeof(T) bytes. For example,

// Initialize a vector of int
std::vector<int> intvec;

// Reserve contigious block of 4 4-byte chunks of memory
intvec.reserve(4);  // [ | | | ]

// Filling in the memory chunks has obvious behavior:
intvec.push_back(1);  // [1| | | ]
intvec.push_back(2);  // [1|2| | ]

Then we can access any element in random access time because, if we ask for the kth element of the vector, we simply start at the memory address of the start of the vector and then "jump" k * sizeof(T) bytes to get to the kth element.

Custom Objects

My mental model breaks down for custom objects of unknown/varying size. For example,

class Foo {

public:
    Foo() = default;
    Foo(std::vector<int> vec): _vec{vec} {}

private:
    std::vector<int> _vec;
};

int main() {

    // Initialize a vector Foo
    std::vector<Foo> foovec;

    // Reserve contigious block of 4 ?-byte chunks of memory
    foovec.reserve(4);  // [ | | | ]

    // How does memory allocation work since object sizes are unkown?
    foovec.emplace_back(std::vector<int> {1,2});        // [{1,2}| | | ]
    foovec.emplace_back(std::vector<int> {1,2,3,4,5});  // [{1,2}|{1,2,3,4,5}| | ]

    return 0;
}

Since we don't know the size of each instance of Foo, how does foovec.reserve() allocate memory? Furthermore, how could you achieve random access time we don't know how far to "jump" to get to the kth element?

回答1:

Your concept of size is flawed. A std::vector<type> has a compile time known size of space it is going to take up. It also has a run time size that it may use (this is allocated at run time and the vector holds a pointer to it). You can picture it laid out like

+--------+
|        |
| Vector |
|        |
|        |
+--------+
     |
     |
     v
+-------------------------------------------------+
|         |         |         |         |         |
| Element | Element | Element | Element | Element |
|         |         |         |         |         |
+-------------------------------------------------+

So when you have a vector of things that have a vector in them, each Element becomes the vector and then those point of to their own storage somewhere else like

+--------+
|        |
| Vector |
|        |
|        |
+----+---+
     |
     |
     v
+----+----+---------+---------+
| Object  | Object  | Object  |
|  with   |  with   |  with   |
| Vector  | Vector  | Vector  |
+----+----+----+----+----+----+
     |         |         |   +---------+---------+---------+---------+---------+
     |         |         |   |         |         |         |         |         |
     |         |         +-->+ Element | Element | Element | Element | Element |
     |         |             |         |         |         |         |         |
     |         |             +-------------------------------------------------+
     |         |    +-------------------------------------------------+
     |         |    |         |         |         |         |         |
     |         +--->+ Element | Element | Element | Element | Element |
     |              |         |         |         |         |         |
     |              +-------------------------------------------------+
     |    +-------------------------------------------------+
     |    |         |         |         |         |         |
     +--->+ Element | Element | Element | Element | Element |
          |         |         |         |         |         |
          +---------+---------+---------+---------+---------+

This way all of the vectors are next to each other, but the elements the vectors have can be anywhere else in memory. It is for this reason you don't want to use a std:vector<std::vector<int>> for a matrix. All of the sub vectors get memory to wherever so there is no locality between the rows.



回答2:

the size of

class Foo {

public:
    Foo() = default;
    Foo(std::vector<int> vec): _vec{vec} {}

private:
    std::vector<int> _vec;
};

is known and constant, the internal std::vector does the allocation in the heap, so there is no problem to do foovec.reserve(4);

else how a std::vector can be in the stack ? ;-)



回答3:

The size of your class Foo is known at compile time, the std::vector class has a constant size, as the elements that it hold are allocated on the heap.

std::vector<int> empty{};
std::vector<int> full{};
full.resize(1000000);
assert(sizeof(empty) == sizeof(full));

Both instances of std::vector<int>, empty and full will always have the same size despite holding a different number of elements.

If you want an array which you can not resize, and it's size must be known at compile time, use std::array.



回答4:

When you create a vector of type T and then reserve N elements for the vector, the compiler basically finds and reserves a contiguous block of memory

The compiler does no such thing. It generates code to request storage from the vector's allocator at runtime. By default this is std::allocator, which delegates to operator new, which will fetch uninitialized storage from the runtime system.

My mental model breaks down for custom objects of unknown/varying size

The only way a user-defined type can actually have unknown size is if it is incomplete - and you can't declare a vector to an incomplete type.

At any point in your code where the type is complete, its size is also fixed, and you can declare a vector storing that type as usual.


Your Foo is complete, and its size is fixed at compile time. You can check this with sizeof(Foo), and sizeof(foovec[0]) etc.

The vector owns a variable amount of storage, but doesn't contain it in the object. It just stores a pointer and the reserved & used sizes (or something equivalent). For example, an instance of:

class toyvec {
  int *begin_;
  int *end_;
  size_t capacity_;
public:
  // push_back, begin, end, and all other methods
};

always has fixed size sizeof(toyvec) = 2 * sizeof(int*) + sizeof(size_t) + maybe_some_padding. Allocating a huge block of memory, and setting begin to the start of it, has no effect on the size of the pointer itself.


tl;dr C++ does not have dynamically-resizing objects. The size of an object is fixed permanently by the class definition. C++ does have objects which own - and may resize - dynamic storage, but that isn't part of the object itself.



标签: c++ vector