Brief summary
I want to build a set of sets of items in C#. The inner sets of items have a GetHashCode
and Equals
method defined by their contents. In mathematical notation:
x = { }
x.Add( { A, B, C } )
x.Add( { A, D } )
x.Add( { B, C, A } )
now x should be{ { A, B, C }, { A, D } }
In python, this could be accomplished with frozenset
:
x = set()
x.add( frozenset(['A','B','C']) )
x.add( frozenset(['A','D']) )
x.add( frozenset(['B','C','A']) )
/BriefSummary
I would like to have a hashable HashSet in C#. This would allow me to do:
HashSet<ContentHashableHashSet<int>> setOfSets;
Although there are more sophisticated ways to accomplish this, This can be trivially achieved in practice (although not in the most efficient manner) by adding overriding ContentHashableHashSet.ToString()
(outputing the strings of the elements contained in sorted order) and then using then using ContentHashableHashSet.ToString().GetHashCode()
as the hash code.
However, if an object is modified after placement in setOfSets
, it could result in multiple copies:
var setA = new ContentHashableHashSet<int>();
setA.Add(1);
setA.Add(2);
var setB = new ContentHashableHashSet<int>();
setB.Add(1);
setOfSets.Add(setA);
setOfSets.Add(setB);
setB.Add(2); // now there are duplicate members!
As far as I can see, I have two options: I can derive ContentHashableHashSet
from HashSet
, but then I will need to make it so that all modifiers throw an exception. Missing one modifier could cause an insidious bug.
Alternatively, I can use encapsulation and class ContentHashableHashSet
can contain a readonly HashSet
. But then I would need to reimplement all set methods (except modifiers) so that the ContentHashableHashSet
can behave like a HashSet
. As far as I know, extensions would not apply.
Lastly, I could encapsulate as above and then all set-like functionality will occur by returning the const (or readonly?) HashSet member.
In hindsight, this is reminiscent of python's frozenset
. Does anyone know of a well-designed way to implement this in C#?
If I was able to lose ISet
functionality, then I would simply create a sorted ImmutableList
, but then I would lose functionality like fast union, fast intersection, and sub-linear ( roughly O(log(n)) ) set membership with Contains
.
EDIT: The base class HashSet does not have virtual Add
and Remove
methods, so overriding them will work within the derived class, but will not work if you perform HashSet<int> set = new ContentHashableHashSet<int>();
. Casting to the base class will allow editing.
EDIT 2: Thanks to @xanatos for recommending a simple GetHashCode
implementation:
The easiest way to calculate the GetHashCode is to simply xor (^) all the gethashcodes of the elements. The xor operator is commutative, so the ordering is irrelevant. For the comparison you can use the SetEquals
EDIT 3: Someone recently shared information about ImmutableHashSet, but because this class is sealed, it is not possible to derive from it and override GetHashCode
.
I was also told that HashSet
takes an IEqualityComparer
as an argument, and so this can be used to provide an immutable, content-hashable set without deriving from ImmutableHashSet; however, this is not a very object oriented solution: every time I want to use a ContentHashableHashSet
, I will need to pass the same (non-trivial) argument. As I'm sure you know, this can really wreak havoc with your coding zen, and where I would be flying along in python with myDictionary[ frozenset(mySet) ] = myValue
, I will be stuck doing the same thing again and again and again.
Thanks for any help you can provide. I have a temporary workaround (whose problems are mentioned in EDIT 1 above), but I'd mostly like to learn about the best way to design something like this.
Hide the elements of your set of sets so that they can't be changed. That means copying when you add/retrieve sets, but maybe that's acceptable?