Is it possible to share a HashMap between threads

2019-07-31 23:39发布

问题:

I would like to have a shared struct between threads. The struct has many fields that are never modified and a HashMap, which is. I don't want to lock the whole HashMap for a single update/remove, so my HashMap looks something like HashMap<u8, Mutex<u8>>. This works, but it makes no sense since the thread will lock the whole map anyways.

Here's this working version, without threads; I don't think that's necessary for the example.

use std::collections::HashMap;
use std::sync::{Arc, Mutex};

fn main() {
    let s = Arc::new(Mutex::new(S::new()));
    let z = s.clone();
    let _ = z.lock().unwrap();
}

struct S {
    x: HashMap<u8, Mutex<u8>>, // other non-mutable fields
}

impl S {
    pub fn new() -> S {
        S {
            x: HashMap::default(),
        }
    }
}

Playground

Is this possible in any way? Is there something obvious I missed in the documentation?

I've been trying to get this working, but I'm not sure how. Basically every example I see there's always a Mutex (or RwLock, or something like that) guarding the inner value.

回答1:

I don't see how your request is possible, at least not without some exceedingly clever lock-free data structures; what should happen if multiple threads need to insert new values that hash to the same location?

In previous work, I've used a RwLock<HashMap<K, Mutex<V>>>. When inserting a value into the hash, you get an exclusive lock for a short period. The rest of the time, you can have multiple threads with reader locks to the HashMap and thus to a given element. If they need to mutate the data, they can get exclusive access to the Mutex.

Here's an example:

use std::collections::HashMap;
use std::sync::{Mutex, RwLock, Arc};
use std::thread;
use std::time::Duration;

fn main() {
    let data = Arc::new(RwLock::new(HashMap::new()));

    let threads: Vec<_> = (0..10)
        .map(|i| {
            let data = Arc::clone(&data);
            thread::spawn(move || worker_thread(i, data))
        })
        .collect();

    for t in threads {
        t.join().expect("Thread panicked");
    }

    println!("{:?}", data);
}

fn worker_thread(id: u8, data: Arc<RwLock<HashMap<u8, Mutex<i32>>>>) {
    loop {
        // Assume that the element already exists
        let map = data.read().expect("RwLock poisoned");

        if let Some(element) = map.get(&id) {
            let mut element = element.lock().expect("Mutex poisoned");

            // Perform our normal work updating a specific element.
            // The entire HashMap only has a read lock, which
            // means that other threads can access it.
            *element += 1;
            thread::sleep(Duration::from_secs(1));

            return;
        }

        // If we got this far, the element doesn't exist

        // Get rid of our read lock and switch to a write lock
        // You want to minimize the time we hold the writer lock
        drop(map);
        let mut map = data.write().expect("RwLock poisoned");

        // Assume that no other thread will be inserting the same key,
        // otherwise we need to recheck to see if it exists
        thread::sleep(Duration::from_millis(50));
        map.insert(id, Mutex::new(0));
        // Let the loop start us over to try again
    }
}

This takes about 5.5 seconds to run on my machine, even though it starts 10 threads that each wait for 1 second while holding the exclusive lock to the element's data.

This solution isn't without issues, however. When there's a huge amount of contention for that one master lock, getting a write lock can take a while and completely kills parallelism.

I often advocate for performing some kind of smarter algorithm. For example, you could spin up N threads each with their own HashMap. You then shard work among them. For the simple example above, you could use id % N_THREADS, for example. There are also complicated sharding schemes that depend on your data.

As Go has done a good job of evangelizing: do not communicate by sharing memory; instead, share memory by communicating.



回答2:

Maybe you want to consider rust-evmap:

A lock-free, eventually consistent, concurrent multi-value map.

The trade-off is eventual-consistency: Readers do not see changes until the writer refreshes the map. A refresh is atomic and the writer decides when to do it and expose new data to the readers.