Best way to remove elements of Vec depending on ot

2019-02-18 03:14发布

问题:

I have a vector of sets and I want to remove all sets that are subsets of other sets in the vector. Example:

a = {0, 3, 5}
b = {0, 5}
c = {0, 2, 3}

In this case I would like to remove b, because it's a subset of a. I'm fine with using a "dumb" n² algorithm.

Sadly, it's pretty tricky to get it working with the borrow checker. The best I've come up with is (Playground):

let mut v: Vec<HashSet<u8>> = vec![];

let mut to_delete = Vec::new();
for (i, set_a) in v.iter().enumerate().rev() {
    for set_b in &v[..i] {
        if set_a.is_subset(&set_b) {
            to_delete.push(i);
            break;
        }
    }
}

for i in to_delete {
    v.swap_remove(i);
}

(note: the code above is not correct! See comments for further details)

I see a few disadvantages:

  • I need an additional vector with additional allocations
  • Maybe there are more efficient ways than calling swap_remove often
  • If I need to preserve order, I can't use swap_remove, but have to use remove which is slow

Is there a better way to do this? I'm not just asking about my use case, but about the general case as it's described in the title.

回答1:

Here is a solution that does not make additional allocations and preserves the order:

fn product_retain<T, F>(v: &mut Vec<T>, mut pred: F)
    where F: FnMut(&T, &T) -> bool
{
    let mut j = 0;
    for i in 0..v.len() {
        // invariants:
        // items v[0..j] will be kept
        // items v[j..i] will be removed
        if (0..j).chain(i + 1..v.len()).all(|a| pred(&v[i], &v[a])) {
            v.swap(i, j);
            j += 1;
        }
    }
    v.truncate(j);
}

fn main() {
    // test with a simpler example
    // unique elements
    let mut v = vec![1, 2, 3];
    product_retain(&mut v, |a, b| a != b);
    assert_eq!(vec![1, 2, 3], v);

    let mut v = vec![1, 3, 2, 4, 5, 1, 2, 4];
    product_retain(&mut v, |a, b| a != b);
    assert_eq!(vec![3, 5, 1, 2, 4], v);
}

This is a kind of partition algorithm. The elements in the first partition will be kept and in the second partition will be removed.



回答2:

  • I need an additional vector with additional allocations

I wouldn't worry about that allocation, since the memory and runtime footprint of that allocation will be really small compared to the rest of your algorithm.

  • Maybe there are more efficient ways than calling swap_remove often.
  • If I need to preserve order, I can't use swap_remove, but have to use remove which is slow

I'd change to_delete from Vec<usize> to Vec<bool> and just mark whether a particular hashmap should be removed. You can then use the Vec::retain, which conditionaly removes elements while preserving order. Unfortunately, this function doesn't pass the index to the closure, so we have to create a workaround (playground):

let mut to_delete = vec![false; v.len()];
for (i, set_a) in v.iter().enumerate().rev() {
    for set_b in &v[..i] {
        if set_a.is_subset(&set_b) {
            to_delete[i] = true;
        }
    }
}

{
    // This assumes that retain checks the elements in the order.
    let mut i = 0;
    v.retain(|_| {
        let ret = !to_delete[i];
        i += 1;
        ret
    });
}

If your hashmap has a special value which can never occur under normal conditions, you can use it to mark a hashmap as "to delete", and then check that condition in retain (it would require changing the outer loop from iterator-based to range-based though).


Sidenote (if that HashSet<u8> is not just a toy example): More eficient way to store and compare sets of small integers would be to use a bitset.



回答3:

You can use a while loop instead of the for:

use std::collections::HashSet;

fn main() {
    let arr: &[&[u8]] = &[
        &[3],
        &[1,2,3],
        &[1,3],
        &[1,4],
        &[2,3]
    ];

    let mut v:Vec<HashSet<u8>> = arr.iter()
        .map(|x| x.iter().cloned().collect())
        .collect();

    let mut pos = 0;
    while pos < v.len() {
        let is_sub = v[pos+1..].iter().any(|x| v[pos].is_subset(x)) 
            || v[..pos].iter().any(|x| v[pos].is_subset(x));

        if is_sub {
            v.swap_remove(pos);
        } else {
            pos+=1;
        }
    }
    println!("{:?}", v);
}

There are no additional allocations.


To avoid using remove and swap_remove, you can change the type of vector to Vec<Option<HashSet<u8>>>:

use std::collections::HashSet;

fn main() {
    let arr: &[&[u8]] = &[
        &[3],
        &[1,2,3],
        &[1,3],
        &[1,4],
        &[2,3]
    ];

    let mut v:Vec<Option<HashSet<u8>>> = arr.iter()
        .map(|x| Some(x.iter().cloned().collect()))
        .collect();

    for pos in 0..v.len(){
        let is_sub = match v[pos].as_ref() {
            Some(chk) => 
                v[..pos].iter().flat_map(|x| x).any(|x| chk.is_subset(x)) 
                ||  v[pos+1..].iter().flat_map(|x| x).any(|x| chk.is_subset(x)),
            None => false,
        };

        if is_sub { v[pos]=None };//Replace with None instead remove

    }
    println!("{:?}", v);//[None, Some({3, 2, 1}), None, Some({1, 4}), None]
}