I've read that a vector
-of-vector
s is bad given a fixed 2nd dimension, but I cannot find a clear explanation of the problems on http://www.stackoverflow.com.
Could someone give an explanation of why using 2D indexing on a single vector
is preferable to using a vector
-of-vector
s for a fixed 2nd dimension?
Also, I'm assuming that a vector
-of-vector
s is the preferable data structure for a 2D array with a variable 2nd dimension? If there is any proof to the contrary I'd love to see that.
I shall answer with a simple analogy.
What is "better" in general out of the two things?
- A telephone book where each entry is a code referring to a different book that you have to find and read to discover someone's telephone number
- A telephone book that lists people's telephone numbers
Keeping all your data in a single big blob is more simple, more sensible, and easier on your computer's cache. A vector with N vectors inside it is much more operationally complex (remember, each of those requires a dynamic allocation and size management operations!); one vector is, well, one vector. You haven't multiplied the workload by N.
The only downside really is that to simulate 2D array access with a 1D underlying data store, you need to write a facade. Fortunately, this is very easy.
Now for the subjective part: on balance I'd say that it's worthwhile unless you're really in a rush and your code quality doesn't particularly matter.
For a std::vector
the underlying array is dynamically allocated from the heap. If you have e.g. std::vector<std::vector<double>>
, then your outer vector would look like
{v1, v2, v3, v4, ... vn}
This looks like each of the inner vectors will be in contiguous memory, and they will, but their underlying arrays will not be contiguous. See the diagram of the memory layout in this post. In other words the you cannot say that
&(v1.back()) + 1 == &(v2.front()) // not necessarily true!
Instead if you used a single vector with striding then you would gain data locality, and it would inherently be more cache friendly as all your data is contiguous.
For the sake of completeness, I would use neither of these methods if your matrix was sparse, as there are more elegant and efficient storage schemes than straight 1D or 2D arrays. Though since you mentioned you have a "fixed 2nd dimension" I will assume that is not the case here.
Using a vector of vectors:
- Is inefficient in terms of memory allocation, due to multiple blocks being allocated.
- Models a jagged right hand edge, so bugs can creep in.
Using a single vector is, in general, better as the memory management is simpler. But you can encounter problems if your matrix is large as it can be difficult to acquire a large contiguous block.
If your array is resizeable, then I'd still stick to a single vector: the resize complexity can be isolated in a single function that you can optimise.
The best solution of all is, of course, to use something like the linear algebra library (BLAS), available in Boost. That also handles large sparse matrices beautifully.