Implementing a content-hashable HashSet in C# (lik

2019-09-02 10:49发布

问题:

Brief summary

I want to build a set of sets of items in C#. The inner sets of items have a GetHashCode and Equals method defined by their contents. In mathematical notation:

x = { }
x.Add( { A, B, C } )
x.Add( { A, D } )
x.Add( { B, C, A } )

now x should be{ { A, B, C }, { A, D } }

In python, this could be accomplished with frozenset:

x = set()
x.add( frozenset(['A','B','C']) )
x.add( frozenset(['A','D']) )
x.add( frozenset(['B','C','A']) )

/BriefSummary

I would like to have a hashable HashSet in C#. This would allow me to do:

HashSet<ContentHashableHashSet<int>> setOfSets;

Although there are more sophisticated ways to accomplish this, This can be trivially achieved in practice (although not in the most efficient manner) by adding overriding ContentHashableHashSet.ToString() (outputing the strings of the elements contained in sorted order) and then using then using ContentHashableHashSet.ToString().GetHashCode() as the hash code.

However, if an object is modified after placement in setOfSets, it could result in multiple copies:

var setA = new ContentHashableHashSet<int>();
setA.Add(1);
setA.Add(2);
var setB = new ContentHashableHashSet<int>();
setB.Add(1);

setOfSets.Add(setA);
setOfSets.Add(setB);

setB.Add(2); // now there are duplicate members!

As far as I can see, I have two options: I can derive ContentHashableHashSet from HashSet, but then I will need to make it so that all modifiers throw an exception. Missing one modifier could cause an insidious bug.

Alternatively, I can use encapsulation and class ContentHashableHashSet can contain a readonly HashSet. But then I would need to reimplement all set methods (except modifiers) so that the ContentHashableHashSet can behave like a HashSet. As far as I know, extensions would not apply.

Lastly, I could encapsulate as above and then all set-like functionality will occur by returning the const (or readonly?) HashSet member.

In hindsight, this is reminiscent of python's frozenset. Does anyone know of a well-designed way to implement this in C#?

If I was able to lose ISet functionality, then I would simply create a sorted ImmutableList, but then I would lose functionality like fast union, fast intersection, and sub-linear ( roughly O(log(n)) ) set membership with Contains.

EDIT: The base class HashSet does not have virtual Add and Remove methods, so overriding them will work within the derived class, but will not work if you perform HashSet<int> set = new ContentHashableHashSet<int>();. Casting to the base class will allow editing.

EDIT 2: Thanks to @xanatos for recommending a simple GetHashCode implementation:

The easiest way to calculate the GetHashCode is to simply xor (^) all the gethashcodes of the elements. The xor operator is commutative, so the ordering is irrelevant. For the comparison you can use the SetEquals

EDIT 3: Someone recently shared information about ImmutableHashSet, but because this class is sealed, it is not possible to derive from it and override GetHashCode.

I was also told that HashSet takes an IEqualityComparer as an argument, and so this can be used to provide an immutable, content-hashable set without deriving from ImmutableHashSet; however, this is not a very object oriented solution: every time I want to use a ContentHashableHashSet, I will need to pass the same (non-trivial) argument. As I'm sure you know, this can really wreak havoc with your coding zen, and where I would be flying along in python with myDictionary[ frozenset(mySet) ] = myValue, I will be stuck doing the same thing again and again and again.

Thanks for any help you can provide. I have a temporary workaround (whose problems are mentioned in EDIT 1 above), but I'd mostly like to learn about the best way to design something like this.

回答1:

Hide the elements of your set of sets so that they can't be changed. That means copying when you add/retrieve sets, but maybe that's acceptable?

// Better make sure T is immutable too, else set hashes could change
public class SetofSets<T>
{
    private class HashSetComparer : IEqualityComparer<HashSet<T>>
    {
        public int GetHashCode(HashSet<T> x)
        {
            return x.Aggregate(1, (code,elt) => code ^ elt.GetHashCode());
        }

        public bool Equals(HashSet<T> x, HashSet<T> y)
        {
            if (x==null)
                return y==null;
            return x.SetEquals(y);
        }
    }

    private HashSet<HashSet<T>> setOfSets;
    public SetofSets()
    {
        setOfSets = new HashSet<HashSet<T>>(new HashSetComparer());
    }

    public void Add(HashSet<T> set)
    {
        setOfSets.Add(new HashSet<T>(set));
    }

    public bool Contains(HashSet<T> set)
    {
        return setOfSets.Contains(set);
    }
}