Why can't I wrap a T* in an std::vector?

2019-01-20 12:31发布

问题:

I have a T* addressing a buffer with len elements of type T. I need this data in the form of an std::vector<T>, for certain reasons. As far as I can tell, I cannot construct a vector which uses my buffer as its internal storage. Why is that?

Notes:

  • Please don't suggest I use iterators - I know that's usually the way around such issues.
  • I don't mind that the vector having to copy data around if it's resized later.
  • This question especially baffles me now that C++ has move semantics. If we can pull an object's storage from under its feet, why not be able to shove in our own?

回答1:

You can.

You write about std::vector<T>, but std::vector takes two template arguments, not just one. The second template argument specifies the allocator type to use, and vector's constructors have overloads that allow passing in a custom instance of that allocator type.

So all you need to do is write an allocator that uses your own internal buffer where possible, and falls back to asking the default allocator when your own internal buffer is full.

The default allocator cannot possibly hope to handle it, since it would have no clue on which bits of memory can be freed and which cannot.


A sample stateful allocator with an internal buffer containing already-constructed elements that should not be overwritten by the vector, including a demonstration of a big gotcha:

struct my_allocator_state {
    void *buf;
    std::size_t len;
    bool bufused;
    const std::type_info *type;
};

template <typename T>
struct my_allocator {
    typedef T value_type;

    my_allocator(T *buf, std::size_t len)
        : def(), state(std::make_shared<my_allocator_state, my_allocator_state>({ buf, len, false, &typeid(T) })) { }

    template <std::size_t N>
    my_allocator(T(&buf)[N])
        : def(), state(std::make_shared<my_allocator_state, my_allocator_state>({ buf, N, false, &typeid(T) })) { }

    template <typename U>
    friend struct my_allocator;

    template <typename U>
    my_allocator(my_allocator<U> other)
        : def(), state(other.state) { }

    T *allocate(std::size_t n)
    {
        if (!state->bufused && n == state->len && typeid(T) == *state->type)
        {
            state->bufused = true;
            return static_cast<T *>(state->buf);
        }
        else
            return def.allocate(n);
    }

    void deallocate(T *p, std::size_t n)
    {
        if (p == state->buf)
            state->bufused = false;
        else
            def.deallocate(p, n);
    }

    template <typename...Args>
    void construct(T *c, Args... args)
    {
        if (!in_buffer(c))
            def.construct(c, std::forward<Args>(args)...);
    }

    void destroy(T *c)
    {
        if (!in_buffer(c))
            def.destroy(c);
    }

    friend bool operator==(const my_allocator &a, const my_allocator &b) {
        return a.state == b.state;
    }

    friend bool operator!=(const my_allocator &a, const my_allocator &b) {
        return a.state != b.state;
    }

private:
    std::allocator<T> def;
    std::shared_ptr<my_allocator_state> state;

    bool in_buffer(T *p) {
        return *state->type == typeid(T)
            && points_into_buffer(p, static_cast<T *>(state->buf), state->len);
    }
};

int main()
{
    int buf [] = { 1, 2, 3, 4 };
    std::vector<int, my_allocator<int>> v(sizeof buf / sizeof *buf, {}, buf);
    v.resize(3);
    v.push_back(5);
    v.push_back(6);
    for (auto &i : v) std::cout << i << std::endl;
}

Output:

1
2
3
4
6

The push_back of 5 fits into the old buffer, so construction is bypassed. When 6 is added, new memory is allocated, and everything starts acting as normal. You could avoid that problem by adding a method to your allocator to indicate that from that point onward, construction should not be bypassed any longer.

points_into_buffer turned out to be the hardest part to write, and I've omitted that from my answer. The intended semantics should be obvious from how I'm using it. Please see my question here for a portable implementation in my answer there, or if your implementation allows it, use one of the simpler versions in that other question.

By the way, I'm not really happy with how some implementations use rebind in such ways that there is no avoiding storing run-time type info along with the state, but if your implementation doesn't need that, you could make it a bit simpler by making the state a template class (or a nested class) too.



回答2:

The short answer is that a vector can't use your buffer because it wasn't designed that way.

It makes sense, too. If a vector doesn't allocate its own memory, how does it resize the buffer when more items are added? It allocates a new buffer, but what does it do with the old one? Same applies to moving - if the vector doesn't control its own buffer, how can it give control of this buffer to another instance?



回答3:

These days - you no longer need to, you can wrap the T* with an std::span (in C++20; before that - use gsl::span). A span offers you all the convenience of a standard library container - in fact, basically all relevant features of std::vector excluding size changing - with a very thin wrapper class. That's what you want to use, really.

For more on spans, read: What is a "span" and when should I use one?