Shared ownership of an str between a HashMap and a

2019-07-31 07:02发布

I come from a Java/C#/JavaScript background and I am trying to implement a Dictionary that would assign each passed string an id that never changes. The dictionary should be able to return a string by the specified id. This allows to store some data that has a lot of repetitive strings far more efficiently in the file system because only the ids of strings would be stored instead of entire strings.

I thought that a struct with a HashMap and a Vec would do but it turned out to be more complicated than that.

I started with the usage of &str as a key for HashMap and an item of Vec like in the following sample. The value of HashMap serves as an index into Vec.

pub struct Dictionary<'a> {
    values_map: HashMap<&'a str, u32>,
    keys_map: Vec<&'a str>
}

impl<'a> Dictionary<'a> {
    pub fn put_and_get_key(&mut self, value: &'a str) -> u32 {
        match self.values_map.get_mut(value) {
            None => {
                let id_usize = self.keys_map.len();
                let id = id_usize as u32;
                self.keys_map.push(value);
                self.values_map.insert(value, id);
                id
            },
            Some(&mut id) => id
        }
    }
}

This works just fine until it turns out that the strs need to be stored somewhere, preferably in this same struct as well. I tried to store a Box<str> in the Vec and &'a str in the HashMap.

pub struct Dictionary<'a> {
    values_map: HashMap<&'a str, u32>,
    keys_map: Vec<Box<str>>
}

The borrow checker did not allow this of course because it would have allowed a dangling pointer in the HashMap when an item is removed from the Vec (or in fact sometimes when another item is added to the Vec but this is an off-topic here).

I understood that I either need to write unsafe code or use some form of shared ownership, the simplest kind of which seems to be an Rc. The usage of Rc<Box<str>> looks like introducing double indirection but there seems to be no simple way to construct an Rc<str> at the moment.

pub struct Dictionary {
    values_map: HashMap<Rc<Box<str>>, u32>,
    keys_map: Vec<Rc<Box<str>>>
}

impl Dictionary {
    pub fn put_and_get_key(&mut self, value: &str) -> u32 {
        match self.values_map.get_mut(value) {
            None => {
                let id_usize = self.keys_map.len();
                let id = id_usize as u32;
                let value_to_store = Rc::new(value.to_owned().into_boxed_str());
                self.keys_map.push(value_to_store);
                self.values_map.insert(value_to_store, id);
                id
            },
            Some(&mut id) => id
        }
    }
}

Everything seems fine with regard to ownership semantics, but the code above does not compile because the HashMap now expects an Rc, not an &str:

error[E0277]: the trait bound `std::rc::Rc<Box<str>>: std::borrow::Borrow<str>` is not satisfied
  --> src/file_structure/sample_dictionary.rs:14:31
   |
14 |         match self.values_map.get_mut(value) {
   |                               ^^^^^^^ the trait `std::borrow::Borrow<str>` is not implemented for `std::rc::Rc<Box<str>>`
   |
   = help: the following implementations were found:
   = help:   <std::rc::Rc<T> as std::borrow::Borrow<T>>

Questions:

  1. Is there a way to construct an Rc<str>?
  2. Which other structures, methods or approaches could help to resolve this problem. Essentially, I need a way to efficiently store two maps string-by-id and id-by-string and be able to retrieve an id by &str, i.e. without any excessive allocations.

标签: hashmap rust
1条回答
Juvenile、少年°
2楼-- · 2019-07-31 07:26

Is there a way to construct an Rc<str>?

Annoyingly, not that I know of. Rc::new requires a Sized argument, and I am not sure whether it is an actual limitation, or just something which was forgotten.

Which other structures, methods or approaches could help to resolve this problem?

If you look at the signature of get you'll notice:

fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V>
    where K: Borrow<Q>, Q: Hash + Eq

As a result, you could search by &str if K implements Borrow<str>.

String implements Borrow<str>, so the simplest solution is to simply use String as a key. Sure it means you'll actually have two String instead of one... but it's simple. Certainly, a String is simpler to use than a Box<str> (although it uses 8 more bytes).

If you want to shave off this cost, you can use a custom structure instead:

#[derive(Clone, Debug)]
struct RcStr(Rc<String>);

And then implement Borrow<str> for it. You'll then have 2 allocations per key (1 for Rc and 1 for String). Depending on the size of your String, it might consume less or more memory.


If you wish to got further (why not?), here are some ideas:

  • implement your own reference-counted string, in a single heap-allocation,
  • use a single arena for the slice inserted in the Dictionary,
  • ...
查看更多
登录 后发表回答