How to implement HashMap with two keys?

2020-01-25 09:42发布

问题:

HashMap implements the get and insert methods which take a single immutable borrow and a single move of a value respectively.

I want a trait which is just like this but which takes two keys instead of one. It uses the map inside, but it's just a detail of implementation.

pub struct Table<A: Eq + Hash, B: Eq + Hash> {
    map: HashMap<(A, B), f64>,
}

impl<A: Eq + Hash, B: Eq + Hash> Memory<A, B> for Table<A, B> {
    fn get(&self, a: &A, b: &B) -> f64 {
        let key: &(A, B) = ??;
        *self.map.get(key).unwrap()
    }

    fn set(&mut self, a: A, b: B, v: f64) {
        self.map.insert((a, b), v);
    }
}

回答1:

This is certainly possible. The signature of get is

fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V> 
where
    K: Borrow<Q>,
    Q: Hash + Eq, 

The problem here is to implement a &Q type such that

  1. (A, B): Borrow<Q>
  2. Q implements Hash + Eq

To satisfy condition (1), we need to think of how to write

fn borrow(self: &(A, B)) -> &Q

The trick is that &Q does not need to be a simple pointer, it can be a trait object! The idea is to create a trait Q which will have two implementations:

impl Q for (A, B)
impl Q for (&A, &B)

The Borrow implementation will simply return self and we can construct a &Q trait object from the two elements separately.


The full implementation is like this:

use std::collections::HashMap;
use std::hash::{Hash, Hasher};
use std::borrow::Borrow;

// See explanation (1).
#[derive(PartialEq, Eq, Hash)]
struct Pair<A, B>(A, B);

#[derive(PartialEq, Eq, Hash)]
struct BorrowedPair<'a, 'b, A: 'a, B: 'b>(&'a A, &'b B);

// See explanation (2).
trait KeyPair<A, B> {
    /// Obtains the first element of the pair.
    fn a(&self) -> &A;
    /// Obtains the second element of the pair.
    fn b(&self) -> &B;
}

// See explanation (3).
impl<'a, A, B> Borrow<KeyPair<A, B> + 'a> for Pair<A, B>
where
    A: Eq + Hash + 'a,
    B: Eq + Hash + 'a,
{
    fn borrow(&self) -> &(KeyPair<A, B> + 'a) {
        self
    }
}

// See explanation (4).
impl<'a, A: Hash, B: Hash> Hash for (KeyPair<A, B> + 'a) {
    fn hash<H: Hasher>(&self, state: &mut H) {
        self.a().hash(state);
        self.b().hash(state);
    }
}

impl<'a, A: Eq, B: Eq> PartialEq for (KeyPair<A, B> + 'a) {
    fn eq(&self, other: &Self) -> bool {
        self.a() == other.a() && self.b() == other.b()
    }
}

impl<'a, A: Eq, B: Eq> Eq for (KeyPair<A, B> + 'a) {}

// OP's Table struct
pub struct Table<A: Eq + Hash, B: Eq + Hash> {
    map: HashMap<Pair<A, B>, f64>,
}

impl<A: Eq + Hash, B: Eq + Hash> Table<A, B> {
    fn new() -> Self {
        Table { map: HashMap::new() }
    }

    fn get(&self, a: &A, b: &B) -> f64 {
        *self.map.get(&BorrowedPair(a, b) as &KeyPair<A, B>).unwrap()
    }

    fn set(&mut self, a: A, b: B, v: f64) {
        self.map.insert(Pair(a, b), v);
    }
}

// Boring stuff below.

impl<A, B> KeyPair<A, B> for Pair<A, B>
where
    A: Eq + Hash,
    B: Eq + Hash,
{
    fn a(&self) -> &A {
        &self.0
    }
    fn b(&self) -> &B {
        &self.1
    }
}
impl<'a, 'b, A, B> KeyPair<A, B> for BorrowedPair<'a, 'b, A, B>
where
    A: Eq + Hash + 'a,
    B: Eq + Hash + 'b,
{
    fn a(&self) -> &A {
        self.0
    }
    fn b(&self) -> &B {
        self.1
    }
}

//----------------------------------------------------------------

#[derive(Eq, PartialEq, Hash)]
struct A(&'static str);

#[derive(Eq, PartialEq, Hash)]
struct B(&'static str);

fn main() {
    let mut table = Table::new();
    table.set(A("abc"), B("def"), 4.0);
    table.set(A("123"), B("456"), 45.0);
    println!("{:?} == 45.0?", table.get(&A("123"), &B("456")));
    println!("{:?} == 4.0?", table.get(&A("abc"), &B("def")));
    // Should panic below.
    println!("{:?} == NaN?", table.get(&A("123"), &B("def")));
}

Explanation:

  1. We introduced the Pair and BorrowedPair types. We can't use (A, B) directly due to the orphan rule E0210. This is fine since the map is an implementation detail.

  2. The KeyPair trait takes the role of Q we mentioned above. We'd need to impl Eq + Hash for KeyPair, but Eq and Hash are both not object safe. We add the a() and b() methods to help implementing them manually.

  3. Now we implement the Borrow trait from Pair<A, B> to KeyPair + 'a. Note the 'a — this is a subtle bit that is needed to make Table::get actually work. The arbitrary 'a allows us to say that a Pair<A, B> can be borrowed to the trait object for any lifetime. If we don't specify the 'a, the unsized trait object will default to 'static, meaning the Borrow trait can only be applied when the implementation like BorrowedPair outlives 'static, which is certainly not the case.

  4. Finally, we implement Eq and Hash. As above, we implement for KeyPair + 'a instead of KeyPair (which means KeyPair + 'static in this context).


Using trait objects will incur indirection cost when computing the hash and checking equality in get(). The cost can be eliminated if the optimizer is able to devirtualize that, but whether LLVM will do it is unknown.

An alternative is to store the map as HashMap<(Cow<A>, Cow<B>), f64>. Using this requires less "clever code", but there is now a memory cost to store the owned/borrowed flag as well as runtime cost in both get() and set().

Unless you fork the standard HashMap and add a method to look up an entry via Hash + Eq alone, there is no guaranteed-zero-cost solution.



回答2:

Within the get method, the values borrowed by a and b may not be adjacent to each other in memory.

[--- A ---]      other random stuff in between      [--- B ---]
 \                                                 /
  &a points to here                               &b points to here

Borrowing a value of type &(A, B) would require an A and a B that are adjacent.

     [--- A ---][--- B ---]
      \
       we could have a borrow of type &(A, B) pointing to here

A bit of unsafe code can fix this! We need a shallow copy of *a and *b.

use std::collections::HashMap;
use std::hash::Hash;
use std::mem::ManuallyDrop;
use std::ptr;

#[derive(Debug)]
pub struct Table<A: Eq + Hash, B: Eq + Hash> {
    map: HashMap<(A, B), f64>
}

impl<A: Eq + Hash, B: Eq + Hash> Table<A, B> {
    fn get(&self, a: &A, b: &B) -> f64 {
        unsafe {
            // The values `a` and `b` may not be adjacent in memory. Perform a
            // shallow copy to make them adjacent. This should be fast! This is
            // not a deep copy, so for example if the type `A` is `String` then
            // only the pointer/length/capacity are copied, not any of the data.
            //
            // This makes a `(A, B)` backed by the same data as `a` and `b`.
            let k = (ptr::read(a), ptr::read(b));

            // Make sure not to drop our `(A, B)`, even if `get` panics. The
            // caller or whoever owns `a` and `b` will drop them.
            let k = ManuallyDrop::new(k);

            // Deref `k` to get `&(A, B)` and perform lookup.
            let v = self.map.get(&k);

            // Turn `Option<&f64>` into `f64`.
            *v.unwrap()
        }
    }

    fn set(&mut self, a: A, b: B, v: f64) {
        self.map.insert((a, b), v);
    }
}

fn main() {
    let mut table = Table { map: HashMap::new() };
    table.set(true, true, 1.0);
    table.set(true, false, 2.0);

    println!("{:#?}", table);

    let v = table.get(&true, &true);
    assert_eq!(v, 1.0);
}


回答3:

A Memory trait that take two keys, set by value and get by reference:

trait Memory<A: Eq + Hash, B: Eq + Hash> {

    fn get(&self, a: &A, b: &B) -> Option<&f64>;

    fn set(&mut self, a: A, b: B, v: f64);
}

You can impl such trait using a Map of Maps:

pub struct Table<A: Eq + Hash, B: Eq + Hash> {
    table: HashMap<A, HashMap<B, f64>>,
}   

impl<A: Eq + Hash, B: Eq + Hash> Memory<A, B> for Table<A, B> {

    fn get(&self, a: &A, b: &B) -> Option<&f64> {
        self.table.get(a)?.get(b)
    }

    fn set(&mut self, a: A, b: B, v: f64) {
        let inner = self.table.entry(a).or_insert(HashMap::new());
        inner.insert(b, v);
    }
}

Please note that if the solution is somewhat elegant the memory footprint of an HashMap of HashMaps have to be considered when thousands of HashMap instances has to be managed.

Full example



标签: hashmap rust