I have a vector of sets and I want to remove all sets that are subsets of other sets in the vector. Example:
a = {0, 3, 5}
b = {0, 5}
c = {0, 2, 3}
In this case I would like to remove b
, because it's a subset of a
. I'm fine with using a "dumb" n² algorithm.
Sadly, it's pretty tricky to get it working with the borrow checker. The best I've come up with is (Playground):
let mut v: Vec<HashSet<u8>> = vec![];
let mut to_delete = Vec::new();
for (i, set_a) in v.iter().enumerate().rev() {
for set_b in &v[..i] {
if set_a.is_subset(&set_b) {
to_delete.push(i);
break;
}
}
}
for i in to_delete {
v.swap_remove(i);
}
(note: the code above is not correct! See comments for further details)
I see a few disadvantages:
- I need an additional vector with additional allocations
- Maybe there are more efficient ways than calling
swap_remove
often - If I need to preserve order, I can't use
swap_remove
, but have to useremove
which is slow
Is there a better way to do this? I'm not just asking about my use case, but about the general case as it's described in the title.
You can use a
while
loop instead of thefor
:There are no additional allocations.
To avoid using
remove
andswap_remove
, you can change the type of vector toVec<Option<HashSet<u8>>>
:Here is a solution that does not make additional allocations and preserves the order:
This is a kind of partition algorithm. The elements in the first partition will be kept and in the second partition will be removed.
I wouldn't worry about that allocation, since the memory and runtime footprint of that allocation will be really small compared to the rest of your algorithm.
I'd change
to_delete
fromVec<usize>
toVec<bool>
and just mark whether a particular hashmap should be removed. You can then use theVec::retain
, which conditionaly removes elements while preserving order. Unfortunately, this function doesn't pass the index to the closure, so we have to create a workaround (playground):If your hashmap has a special value which can never occur under normal conditions, you can use it to mark a hashmap as "to delete", and then check that condition in
retain
(it would require changing the outer loop from iterator-based to range-based though).Sidenote (if that
HashSet<u8>
is not just a toy example): More eficient way to store and compare sets of small integers would be to use a bitset.