Thread safe vector

2019-01-27 19:51发布

问题:

Let me start by saying that I have read most SO and other topics on the subject.

The way I understand things, std::vector will reallocate memory when pushing back new items, which is my case, unless I have reserved enough space (which is not my case).

What I have is a vector of std::shared_ptr, and that vector holds unique objects (or more correctly, pointers to unique objects in the vector).

The handling of those objects via pointers is wrapped around a Factory & Handler class, but pointers to the objects are accessible from outside the wrapper class and can have member values modified. There is no deleting happening at any time.

If I am understanding correctly issues raised in previous SO questions about std::vector and thread safety, adding (push_back) new objects may invalidate previous pointers, as the vector internally may reallocate memory and copy everything over, which would of course be a disaster for me.

My intentions are to read from that vector, often modifying objects through the pointers, and add new items to the vector, from threads running asynchronously.

So,

  1. Using atomic or mutexes is not enough? If I push back from one thread, another thread handling an object via pointer may end up having an invalid object?
  2. Is there a library that can handle this form of MT issues? The one I keep reading about is Intel's TBB, but since I'm already using C++11, I'd love to keep changes to a minimum, even if it means more work on my part - I want to learn in the process, not just copy-paste.
  3. Other than locking access while modifying objects, I would want asynchronous parallel read access to the vector that will not be invalidated by push_backs. How can I achieve that?

If it is of any importance, all the above is on linux (debian jessie) using gcc-4.8 with c++11 enabled.

I am open to using minimally invasive libraries.

Many thanks in advance :-)

回答1:

adding (push_back) new objects may invalidate previous pointers ...

No, this operation doesn't invalidate any previous pointers, unless you are refering to addresses inside the vectors internal data management (which clearly isn't your scenario).
If you store raw pointers, or std::shared_ptr's there, those will be simply copied, and not get invalid.


As mentioned in comments a std::vector isn't very suitable to guarantee thread safety for producer / consumer patterns for a number of reasons. Neither storing raw pointers to reference the alive instances is!

A Queue will be much better to support this. As for standards you can use the std::deque to have ceratain access points (front(),back()) for the producer / consumer.

To make these access point's thread safe (for pushing/popping values) you can easily wrap them with your own class and use a mutex along, to secure insertion/deletion operations on the shared queue reference.
The other (and major, as from your question) point is: manage ownership and lifetime of the contained/referenced instances. You may also transfer ownership to the consumer, if that's suitable for your use case (thus getting off from the overhead with e.g. std::unique_ptr), see below ...

Additionally you may have a semaphore (condition variable), to notify the consumer thread, that new data is available.


'1. Using atomic or mutexes is not enough? If I push back from one thread, another thread handling an object via pointer may end up having an invalid object?'

The lifetime (and thus thread safe use) of the instances stored to the queue (shared container) need's to be managed separately (e.g. using smart pointers like std::shared_ptr or std::unique_ptr stored there).

'2. Is there a library ...'

It can be achieved all well with the existing standard library mechanisms IMHO.

As for point 3. see what's written above. As what I can tell further about this, it sounds like you're asking for something like a rw_lock mutex. You may provide a surrogate for this with a suitable condition variable.

Feel free to ask for more clarification ...



回答2:

If you will be always just adding new items to the container and then accessing them, what you could find useful is a vector with another indirection, so that instead of swaping the internal buffer for a bigger one, the once allocated space is never freed and a new space is just appended somehow in a thread-safe manner.

For example, it can look like this:

concurrent_vector<Object*>:
  size_t m_baseSize = 1000
  size_t m_size     = 3500
  part*  m_parts[6] = {
    part* part1, ----> Object*[1000]
    part* part2, ----> Object*[2000]
    part* part3, ----> Object*[4000]
    NULL,
    ...
  }

The class contains a fixed array of pointers to individual chunks of memory with the items, with exponentially increasing size. Here the limit is 6 parts, so 63000 items - but this can be changed easily.

The container starts with all parts pointers set to NULL. If an item is added, the 1st chunk is created, with size m_baseSize, here 1000, and saved to m_parts[0]. Subsequent items are written there.

When the chunk is full, another buffer is allocated, with twice the size of the previous one (2000), and stored into m_parts[1]. This is repeated as required.

All this can be done using atomic operations, but that is of course tricky. It can be simpler if all writers can be protected by a mutex and only readers are fully concurrent (e.g. if writing is much more rare operation). All reader threads always see either NULL in m_parts[i] or NULL in one of the buffers, or a valid pointer. Existing items are never moved in memory, invalidated or anything.


As far as existing libraries go, you might want to look at Thread Building Blocks by Intel, particularly its class concurrent_vector. Reportedly it has these features:

  • Random access by index. The index of the first element is zero.
  • Multiple threads can grow the container and append new elements concurrently.
  • Growing the container does not invalidate existing iterators or indices.


回答3:

Re-reading the question, the situation seems a bit different.

std::vector is ill-suited for storing objects to which you'd have to keep references, since a push_back can invalidate all the references to the stored objects. But, you are storing a bunch of std::shared_ptr.

The std::shared_ptrs stored inside should handle the resizing gracefully (they are being moved, but not the objects they point to) as long as, in the threads, you don't keep references to std::shared_ptrs stored inside the vector, but you keep copies of them.

Both using std::vector and std::deque you have to synchronize the access to the data structure, since a push_back, although not reference-invalidating, alters the internal structures of the deque, and thus is not allowed to run simultaneously with a deque access.

OTOH, std::deque is probably better suited anyway for performance reasons; at each resize you are moving around a lot of std::shared_ptr, which may have to do a locked increment/decrement to the refcount in case of copy/delete (if they are moved - as they should - this should be elided - but YMMV).

But most importantly, if your usage of std::shared_ptr is just to avoid the potential moves in the vector, you may drop it completely when using a deque, since the references are not invalidated, so you can store your objects directly in the deque instead of using heap allocation and the indirection/overhead of std::shared_ptr.