I am implementing a templated sparse_vector class. It's like a vector, but it only stores elements that are different from their default constructed value.
So, sparse_vector would store the lazily-sorted index-value pairs for all indices whose value is not T().
I am basing my implementation on existing sparse vectors in numeric libraries-- though mine will handle non-numeric types T as well. I looked at boost::numeric::ublas::coordinate_vector
and eigen::SparseVector
.
Both store:
size_t* indices_; // a dynamic array
T* values_; // a dynamic array
int size_;
int capacity_;
Why don't they simply use
vector<pair<size_t, T>> data_;
My main question is what are the pros and cons of both systems, and which is ultimately better?
The vector of pairs manages size_ and capacity_ for you, and simplifies the accompanying iterator classes; it also has one memory block instead of two, so it incurs half the reallocations, and might have better locality of reference.
The other solution might search more quickly since the cache lines fill up with only index data during a search. There might also be some alignment advantages if T is an 8-byte type?
It seems to me that vector of pairs is the better solution, yet both containers chose the other solution. Why?
Effectively, it seems that they reinvented the wheel (so to speak).
I would personally consider 2 libraries for your need:
Loki::AssocVector
-> the interface of a map implemented over avector
(which is what you wish to do)iterator_adaptor
class. Makes it very easy to implement a new container by Composition.As a remark, I would note that you may wish to be a little more generic that values different from the
T()
because this imposeT
to be DefaultConstructible. You could provide a constructor which takes aT const&
. When writing a generic container it is good to try and reduce the necessary requirements as much as possible (as long as it does not hurt performance).Also, I would remind you that the idea of using a
vector
for storage is very good for a little number of values, but you might wish to change the underlying container toward a classicmap
orunordered_map
if the number of values grows. It could be worth profiling/timing. Note that the STL offer this ability with the Container Adapters likestack
, even though it could make implementation slightly harder.Have fun.
Having indices in a separate list would make them faster to look up - as you suggest, it would use the cache more effectively, particularly if T is large.
If you want to implement your own, why not just use
std::map
(orstd::unordered_map
)? Keys would be larger but implementation time would be close to zero!