Why the libc++ std::vector internally keeps three

2019-01-18 20:22发布

问题:

I'm looking at the implementation of std::vector in libc++ and I noticed that it internally keeps three pointers (one to the begin, one the end, and one to the end of the allocated memory) instead of what I'd instinctively do, i.e., one pointer to the begin and two size and capacity members.

Here is the code from libc++'s <vector> (ignore the compressed pair, I know what it means).

pointer                                    __begin_;
pointer                                    __end_;
__compressed_pair<pointer, allocator_type> __end_cap_;

I noticed that also other standard libraries do the same (e.g. Visual C++). I don't see any particular reason why this solution should be faster than the other one, but I might be wrong.

So is there a particular reason the "three pointers" solution is preferred to the "pointer + sizes" one?

回答1:

It's because the rationale is that performance should be optimized for iterators, not indices.
(In other words, performance should be optimized for begin()/end(), not size()/operator[].)
Why? Because iterators are generalized pointers, and thus C++ encourages their use, and in return ensures that their performance matches those of raw pointers when the two are equivalent.

To see why it's a performance issue, notice that the typical for loop is as follows:

for (It i = items.begin(); i != items.end(); ++i)
    ...

Except in the most trivial cases, if we kept track of sizes instead of pointers, what would happen is that the comparison i != items.end() would turn into i != items.begin() + items.size(), taking more instructions than you'd expect. (The optimizer generally has a hard time factoring out the code in many cases.) This slows things down dramatically in a tight loop, and hence this design is avoided.

(I've verified this is a performance problem when trying to write my own replacement for std::vector.)


Edit: As Yakk pointed out in the comments, using indices instead of pointers can also result in the generation of a multiplication instruction when the element sizes aren't powers of 2, which is pretty expensive and noticeable in a tight loop. I didn't think of this when writing this answer, but it's a phenomenon that's bitten me before (e.g. see here)... bottom line is, in a tight loop everything matters.



回答2:

It's more convenient for implementers.

Storing size makes exactly one operation easier to implement: size()

size_t size() { return size_; }

on the other hand, it makes other harder to write and makes reusing code harder:

iterator end() { return iterator(end_); } // range version
iterator end() { return iterator(begin_ + size_); } // pointer + size version

void push_back(const T& v) // range version
{
    // assume only the case where there is enough capacity
    ::new(static_cast<void*>(end_)) T(v);
    ++end_;
}

void push_back(const T& v) // pointer + size version
{
    // assume only the case where there is enough capacity
    ::new(static_cast<void*>(begin_ + size_)) T(v);
    // it could use some internal `get_end` function, but the point stil stands:
    // we need to get to the end
    ++size_;
}

If we have to find the end anyway, we could store it directly - it's more useful than size anyway.



回答3:

I would imagine it's primarily a speed thing. When iterating over the set, the generated instructions for bounds checking would simply be a compare statement with the end pointer (and maybe a load), rather than a load, an add, and a compare (and maybe another load, too).

When generating the iterators for end() and begin(), the code would also just be return pointer;, rather than return pointer + offset; for end().

These are very minor optimizations, but the standard template library is intended to be used in production code where every cycle counts.

PS: In regards to the different compilers implementing it the same way: There is a reference implementation that most (all?) of the compiler vendors base their STL implementations on. It is likely that this particular design decision is a part of the reference implementation, and is why all the implementations you looked at handle vectors this way.