In some situations involving immutable objects, it will be possible for many distinct objects to come into existence which are semantically identical. A simple example would be reading many lines of text from a file into strings. From the program's perspective, the fact that two lines have the same sequence of characters would be "coincidence", but from the programmer's perspective a large amount of duplication may be expected. If many string instances are identical, changing the references to those distinct instances into references to a single instance will save memory, and will also facilitate comparisons between them (if two string references point to the same string, there's no need to do a character-by-character comparison to determine that they are identical).
For some scenarios, the system-provided string interning facility may be useful. It has, however, a couple of severe limitations:
- Once a string is interned, that interned copy will live forever, whether or not there exists any other reference to it
- The string interning facility only works with strings, and is not usable with any other immutable types.
If there existed a true WeakDictionary<ImmutableClassType, ImmutableClassType>
(for each element, the key and value would be identical), code could do something like:
if (theDict.TryGetValue(myString, ref internedString))
myString = internedString;
else
theDict[myString] = myString;
Unfortunately, I am unaware of any built-in WeakDictionary<keyType, valType>
class in .net. Further, it would seem wasteful to generate a weak reference for each item's key and value, when both references would always point to the same thing.
I've read some about ConditionalWeakTable
, and that sounds like an interesting class, but I don't think it would be usable here, since the goal is to be able to take one instance and find another independent instance which is semantically equivalent.
For situations where instances of a class will have a well-defined lifetime, it may be reasonable to use a conventional Dictionary
to merge identical instances. In many cases, however, it may be difficult to know when such a Dictionary
should be abandoned or the items within it cleaned out. A WeakReference
-based interning collection would avoid such issues. Does such a thing exist, or could it be readily implemented?
Addendum
As svick noted, a Dictionary<WeakReference, WeakReference>
would be somewhat problematical as there would be no practical way to define an IEqualityComparer
which would have a live WeakReference
return the GetHashCode
value of its target, and have a dead one continue to return that value. One could define a struct which would contain an integer target-hashcode value (set in the constructor), and whose own GetHashCode
would return that integer. A slight improvement might be to use a ConditionalWeakTable
to link the target of the WeakReference
to a finalizable object which could be used to enqueue table items for deletion.
I'm not sure what the proper balance is between trying to eagerly cleaning out the dictionary, versus taking a somewhat more passive approach (e.g. perform a sweep when adding an item if there's been at least one GC since the last sweep, and the number of items added since the last sweep exceeds the number of items that survived it). Sweeping through everything in the dictionary isn't going to be free, but ConditionalWeakTable probably isn't going to be free either.
PPS
Another notion I was thinking of, but I figured it probably wouldn't be as useful as a weak-interning approach, would be to have a logically-immutable type hold a mutable "timestamp" value, and have a comparison method which accepts its arguments by ref
. If two different instances are found to be equal, their timestamp values would be examined. If both zero, they would be assigned consecutive negative numbers from a global counter (-1, -2, -3, etc.). The parameter which had (or was assigned) the lower timestamp value would then be replaced by the other.
Using such an approach, if many object instances were repeatedly compared against each other, many of the references would be replaced with references to "older" instances. Depending upon usage patterns, this may result in most of the identical object instances being merged without the use of any sort of interning dictionary. Applying such an approach with nested objects, however, would require that "immutable" objects allow nested-object references to be mutated to point to other supposedly-identical nested objects. This should be fine if "supposedly-identical" objects always are, but could cause rather bizarre misbehavior if not.