I have a very large multidimensional vector that changes in size all the time.
Is there any point to use the vector.reserve() function when I only know a good approximation of the sizes.
So basically I have a vector
A[256*256][x][y]
where x goes from 0 to 50 for every iteration in the program and then back to 0 again. The y values can differ every time, which means that for each of the
[256*256][y]
elements the vector y can be of a different size but still smaller than 256;
So to clarify my problem this is what I have:
vector<vector<vector<int>>> A;
for(int i =0;i<256*256;i++){
A.push_back(vector<vector<int>>());
A[i].push_back(vector<int>());
A[i][0].push_back(SOME_VALUE);
}
Add elements to the vector...
A.clear();
And after this I do the same thing again from the top.
When and how should I reserve space for the vectors.
If I have understood this correctly I would save a lot of time if I would use reserve as I change the sizes all the time?
What would be the negative/positive sides of reserving the maximum size my vector can have which would be [256*256][50][256]
in some cases.
BTW. I am aware of different Matrix Templates and Boost, but have decided to go with vectors on this one...
EDIT:
I was also wondering how to use the reserve function in multidimensional arrays.
If I only reserve the vector in two dimensions will it then copy the whole thing if I exceed its capacity in the third dimension?
To help with discussion you can consider the following typedefs:
typedef std::vector<int> int_t; // internal vector
typedef std::vector<int_t> mid_t; // intermediate
typedef std::vector<mid_t> ext_t; // external
The cost of growing (vector capacity increase) int_t
will only affect the contents of this particular vector and will not affect any other element. The cost of growing mid_t
requires copying of all the stored elements in that vector, that is it will require all of the int_t
vector, which is quite more costly. The cost of growing ext_t
is huge: it will require copying all the elements already stored in the container.
Now, to increase performance, it would be much more important to get the correct ext_t
size (it seems fixed 256*256 in your question). Then get the intermediate mid_t
size correct so that expensive reallocations are rare.
The amount of memory you are talking about is huge, so you might want to consider less standard ways to solve your problem. The first thing that comes to mind is adding and extra level of indirection. If instead of holding the actual vectors you hold smart pointers into the vectors you can reduce the cost of growing the mid_t
and ext_t
vectors (if ext_t
size is fixed, just use a vector of mid_t
). Now, this will imply that code that uses your data structure will be more complex (or better add a wrapper that takes care of the indirections). Each int_t
vector will be allocated once in memory and will never move in either mid_t
or ext_t
reallocations. The cost of reallocating mid_t
is proportional to the number of allocated int_t
vectors, not the actual number of inserted integers.
using std::tr1::shared_ptr; // or boost::shared_ptr
typedef std::vector<int> int_t;
typedef std::vector< shared_ptr<int_t> > mid_t;
typedef std::vector< shared_ptr<mid_t> > ext_t;
Another thing that you should take into account is that std::vector::clear()
does not free the allocated internal space in the vector, only destroys the contained objects and sets the size to 0. That is, calling clear()
will never release memory. The pattern for actually releasing the allocated memory in a vector is:
typedef std::vector<...> myvector_type;
myvector_type myvector;
...
myvector.swap( myvector_type() ); // swap with a default constructed vector
Whenever you push a vector into another vector, set the size in the pushed vectors constructor:
A.push_back(vector<vector<int>>( somesize ));
You have a working implementation but are concerned about the performance. If your profiling shows it to be a bottleneck, you can consider using a naked C-style array of integers rather than the vector of vectors of vectors.
See how-do-i-work-with-dynamic-multi-dimensional-arrays-in-c for an example
You can re-use the same allocation each time, realloc
ing as necessary and eventually keeping it at the high-tide mark of usage.
If indeed the vectors are the bottleneck, performance beyond avoiding the sizing operations on the vectors each loop iteration will likely become dominated by your access pattern into the array. Try to access the highest orders sequentially.
If you know the size of a vector at construction time, pass the size to the c'tor and assign using operator[]
instead of push_back
. If you're not totally sure about the final size, make a guess (maybe add a little bit more) and use reserve
to have the vector reserve enough memory upfront.
What would be the negative/positive sides of reserving the maximum size my vector can have which would be [256*256][50][256] in some cases.
Negative side: potential waste of memory. Positive side: less CPU time, less heap fragmentation. It's a memory/cpu tradeoff, the optimum choice depends on your application. If you're not memory-bound (on most consumer machines there's more than enough RAM), consider reserving upfront.
To decide how much memory to reserve, look at the average memory consumption, not at the peak (reserving 256*256*50*256 is not a good idea unless such dimensions are needed regularly)