Context
I have a case where multiple threads must update objects stored in a shared vector. However, the vector is very large, and the number of elements to update is relatively small.
Problem
In a minimal example, the set of elements to update can be identified by a (hash-)set containing the indices of elements to update. The code could hence look as follows:
let mut big_vector_of_elements = generate_data_vector();
while has_things_to_do() {
let indices_to_update = compute_indices();
indices_to_update.par_iter() // Rayon parallel iteration
.map(|index| big_vector_of_elements[index].mutate())
.collect()?;
}
This is obviously disallowed in Rust: big_vector_of_elements
cannot be borrowed mutably in multiple threads at the same time. However, wrapping each element in, e.g., a Mutex
lock seems unnecessary: this specific case would be safe without explicit synchronization. Since the indices come from a set, they are guaranteed to be distinct. No two iterations in the par_iter
touch the same element of the vector.
Restating my question
What would be the best way of writing a program that mutates elements in a vector in parallel, where the synchronization is already taken care of by the selection of indices, but where the compiler does not understand the latter?
A near-optimal solution would be to wrap all elements in big_vector_of_elements
in some hypothetical UncontendedMutex
lock, which would be a variant of Mutex
which is ridiculously fast in the uncontended case, and which may take arbitrarily long when contention occurs (or even panics). Ideally, an UncontendedMutex<T>
should also be of the same size and alignment as T
, for any T
.
Related, but different questions:
Multiple questions can be answered with "use Rayon's parallel iterator", "use chunks_mut
", or "use split_at_mut
":
- How do I run parallel threads of computation on a partitioned array?
- Processing vec in parallel: how to do safely, or without using unstable features?
- How do I pass disjoint slices from a vector to different threads?
- Can different threads write to different sections of the same Vec?
- How to give each CPU core mutable access to a portion of a Vec?
These answers do not seem relevant here, since those solutions imply iterating over the entire big_vector_of_elements
, and then for each element figuring out whether anything needs to be changed. Essentially, this means that such a solution would look as follows:
let mut big_vector_of_elements = generate_data_vector();
while has_things_to_do() {
let indices_to_update = compute_indices();
for (index, mut element) in big_vector_of_elements.par_iter().enumerate() {
if indices_to_update.contains(index) {
element.mutate()?;
}
}
}
This solution takes time proportionate to the size of big_vector_of_elements
, whereas the first solution loops only over a number of elements proportionate to the size of indices_to_update
.
I think this is a reasonable place to use
unsafe
code. The logic itself is safe but cannot be checked by the compiler because it relies on knowledge outside of the type system (the contract ofBTreeSet
, which itself relies on the implementation ofOrd
and friends forusize
).In this sample, we preemptively bounds check all the indices via
range
, so each call toadd
is safe to use. Since we take in a set, we know that all the indices are disjoint, so we aren't introducing mutable aliasing. It's important to get the raw pointer from the slice to avoid aliasing between the slice itself and the returned values.Once you have used this function to find all the separate mutable references, you can use Rayon to modify them in parallel:
See also:
You may be looking for a disjoint-set data structure, a form of partitioning defined by sets of indices to elements of a list. A good Rust implementation of this structure would allow you to safely and efficiently traverse and mutate the values of each set in parallel set-wise, since the sets are known to be disjoint.
Luckily, there is the
partitions
crate, which provides a disjoint-set implementation. Once aPartitionVec
is built, each set can be iterated independently using theall_sets_mut()
method¹. The following code uses rayon to process three sets of numbers in parallel, each with 2 elements.The output:
The rest of the problem lies on building this partitioned vector, but the crate already has facilities for turning a standard
Vec
into aPartitionedVec
and back. By default, each value is assigned to a singleton set. The functioncompute_indices()
proposed in the question would manipulate this vector to create the intended sets beforehand.¹ Probably due to an implementation detail (as of version 0.2.4), the corresponding iterator for immutable access, obtained with
all_sets()
, cannot be safely moved between threads, making it unsuitable for parallel processing.You can sort
indices_to_update
and extract mutable references by callingsplit_*_mut
.Double check everything in this code; I didn't test it. Then you can call
elems.par_iter(...)
or whatever.