Any weak interning collections (for immutable obje

2020-02-26 00:43发布

问题:

In some situations involving immutable objects, it will be possible for many distinct objects to come into existence which are semantically identical. A simple example would be reading many lines of text from a file into strings. From the program's perspective, the fact that two lines have the same sequence of characters would be "coincidence", but from the programmer's perspective a large amount of duplication may be expected. If many string instances are identical, changing the references to those distinct instances into references to a single instance will save memory, and will also facilitate comparisons between them (if two string references point to the same string, there's no need to do a character-by-character comparison to determine that they are identical).

For some scenarios, the system-provided string interning facility may be useful. It has, however, a couple of severe limitations:

  1. Once a string is interned, that interned copy will live forever, whether or not there exists any other reference to it
  2. The string interning facility only works with strings, and is not usable with any other immutable types.

If there existed a true WeakDictionary<ImmutableClassType, ImmutableClassType> (for each element, the key and value would be identical), code could do something like:

if (theDict.TryGetValue(myString, ref internedString))
  myString = internedString;
else
  theDict[myString] = myString;

Unfortunately, I am unaware of any built-in WeakDictionary<keyType, valType> class in .net. Further, it would seem wasteful to generate a weak reference for each item's key and value, when both references would always point to the same thing.

I've read some about ConditionalWeakTable, and that sounds like an interesting class, but I don't think it would be usable here, since the goal is to be able to take one instance and find another independent instance which is semantically equivalent.

For situations where instances of a class will have a well-defined lifetime, it may be reasonable to use a conventional Dictionary to merge identical instances. In many cases, however, it may be difficult to know when such a Dictionary should be abandoned or the items within it cleaned out. A WeakReference-based interning collection would avoid such issues. Does such a thing exist, or could it be readily implemented?

Addendum As svick noted, a Dictionary<WeakReference, WeakReference> would be somewhat problematical as there would be no practical way to define an IEqualityComparer which would have a live WeakReference return the GetHashCode value of its target, and have a dead one continue to return that value. One could define a struct which would contain an integer target-hashcode value (set in the constructor), and whose own GetHashCode would return that integer. A slight improvement might be to use a ConditionalWeakTable to link the target of the WeakReference to a finalizable object which could be used to enqueue table items for deletion.

I'm not sure what the proper balance is between trying to eagerly cleaning out the dictionary, versus taking a somewhat more passive approach (e.g. perform a sweep when adding an item if there's been at least one GC since the last sweep, and the number of items added since the last sweep exceeds the number of items that survived it). Sweeping through everything in the dictionary isn't going to be free, but ConditionalWeakTable probably isn't going to be free either.

PPS Another notion I was thinking of, but I figured it probably wouldn't be as useful as a weak-interning approach, would be to have a logically-immutable type hold a mutable "timestamp" value, and have a comparison method which accepts its arguments by ref. If two different instances are found to be equal, their timestamp values would be examined. If both zero, they would be assigned consecutive negative numbers from a global counter (-1, -2, -3, etc.). The parameter which had (or was assigned) the lower timestamp value would then be replaced by the other.

Using such an approach, if many object instances were repeatedly compared against each other, many of the references would be replaced with references to "older" instances. Depending upon usage patterns, this may result in most of the identical object instances being merged without the use of any sort of interning dictionary. Applying such an approach with nested objects, however, would require that "immutable" objects allow nested-object references to be mutated to point to other supposedly-identical nested objects. This should be fine if "supposedly-identical" objects always are, but could cause rather bizarre misbehavior if not.

回答1:

You could create something like Dictionary<WeakReference, WeakReference> with a custom equality comparer and prune those that are no longer alive at appropriate times. One problem with that is how to remove a dead WeakRefrence from the dictionary, because you can't remove it by reference equality (remember, you have to use custom equality comparer) or index. Possible solutions are creating a type that inherits from WeakReference and has correct hash code even if the reference is dead. Or you could wrap it in a custom struct.

But as you said, it would be nice if each reference was removed from the dictionary immediately after it died. I think the only way to do this is to somehow use finalizers. But if you don't want to (or can't) modify the type in the dictionary, this will get quite complicated.

The basic idea is that you will have the same dictionary of weak references as above (the caveats about how to remove items still apply), but you also attach a helper object with a finalizer to each item in the dictionary using ConditionalWeakTable. And in that finalizer, you will remove the item from the dictionary. Because of how ConditionalWeakTable works, if an item in the dictionary gets GCed, the attached object will too, which means its finalizer will be called and so the item will be removed from the dictionary