Does the implementation of Java's String memory pool follows flyweight pattern?
Why I have this doubt is, I see that there is no extrinsic state involved in Intern. In GoF I read that there should be a right balance between intrinsic and extrinsic state. But in intern everything is intrinsic.
Or shall we say there is no strict rule with respect to attributes and just sharing objects to reduce memory is sufficient to call it a flyweight.
Please help me understand.
Yes the String.intern()
implementation follows the flyweight pattern.
As the javadoc says
Returns a canonical representation for the string object. A pool of
strings, initially empty, is maintained privately by the class String.
When the intern method is invoked, if the pool already contains a
string equal to this String object as determined by the equals(Object)
method, then the string from the pool is returned. Otherwise, this
String object is added to the pool and a reference to this String
object is returned.
It follows that for any two strings s and t, s.intern() == t.intern()
is true if and only if s.equals(t) is true.
All literal strings and string-valued constant expressions are
interned. String literals are defined in §3.10.5 of the Java Language
Specification
The internalized strings reside in the "Perm Gen" space and on string objects returned by .intern()
you can use the operator ==
because .intern()
returns always the same object for equal values.
Then remember that .intern()
method does not produce leaks, because the JVM today is able garbage the pool.
Try to read this article too.
Irrespective of interning, Java String utilizes the flyweight pattern by sharing the char[]
between a string and those derived from it via substring
and similar method calls. This has a flipside, though: if you take a small substring of a huge string, the huge char[]
will not be eligible for garbage collection.
Note: as of OpenJDK version 1.7.0_06 the above has become obsolete: the code was changed so that the char[]
is no longer shared between instances. substring()
creates a new array.
You have correctly identified that both Interning and Flyweight are based on the same idea: caching and sharing common state.
With a Flyweight, in the extreme case when there is no extrinsic state to store, only the pointer to the intrinsic state remains. Then there is no need for the extrinsic state to even be an object, the pointer itself can be the extrinsic state. That's when Flyweight has become Interning.
Whether Interning "really" is or is not a kind of Flyweight is just a debate over definitions. What matters most is the understanding of how one can be viewed as a specialized instance of the other, so you're good.
Just like others have stated, String.intern() is all about caching. It returns the reference to already stored string literal in the pool. In this way it is somehow similar to flyweight pattern as it uses the existing objects resulting in lower memory consumption and increased performance (though intern has its own performance overheads of lookup in the string pool too). Hence those two can appear to be similar but they actually are not.
No, sharing objects to reduce memory is insufficient to call it a flyweight. In other words, caching is not automatically the flyweight pattern.
I think it would be fair to say that flyweight is a special form of caching, i.e. partial caching; but do note the GoF book does not use the words "cache" or "caching" anywhere in the flyweight chapter (though the terms are used in both the previous and subsequent chapters, facade and proxy, respectively).
A couple of comments in this thread are worth repeating, because they succinctly answer the overall question.
If there is no extrinsic context for your objects, then you are just caching. The whole reason the Flyweight pattern is even useful
to define, is that people often forget they can at least cache a part
of the object that is independent of context and share it.
--C S
Flyweight is about sharing the object internals. Interning is just caching the whole objects.
--Marko Topolnik
But let's compare String interning to the criteria that the GoF have defined (on page 197).
Apply the Flyweight pattern when all of the following are true:
- An application uses a large number of objects.
- Storage costs are high because of the sheer quantity of objects.
- Most object state can be made extrinsic.
- Many groups of objects may be replaced by relatively few shared objects once extrinsic state is removed.
- The application doesn't depend on object identity. Since flyweight objects may be shared, identity tests will return true for conceptually distinct objects.
- Clearly, many applications use a large number of Strings, so this criterion passes.
- Storing Strings is expensive, at least compared to primitive types, so let's give this criterion a pass.
- Here's where we get tripped up: none of a String's state is made extrinsic. This criterion fails.
- If we're generous and ignore the part about extrinsic state, we could give this criterion a pass as well, since Strings do tend to be reused.
- Anyone who's ever used
==
to compare Strings in Java knows not to depend on object identity, so this criterion passes.
Well 4/5 passing criteria is pretty good right? Shouldn't that be enough to say that interning/caching and flyweight are the same? No: similar != same. The emphasis on the word all in the GoF quote is theirs, not mine. There is naturally a strong desire to label as many implementations as possible with GoF pattern names, because doing so lends legitimacy to those implementations. (The most egregious cases are the factory patterns, which you can easily find labeling every kind of creational code imaginable; but I digress.) If the patterns are not held to their published definitions, they overlap and lose meaning, defeating a large part of their purpose (common vocabulary).
Lastly, let's analyze the first sentence of the flyweight chapter: what the GoF defines as the Intent of the flyweight pattern.
Use sharing to support large numbers of fine-grained objects efficiently.
I submit that an object with no extrinsic state is not fine-grained, but rather the opposite; so here is a suggested Intent for caching: Use caching to support large numbers of coarse-grained objects efficiently.
Clearly there is similarity between String interning/caching and the Flyweight Pattern; but they are not the same.
Flyweight is about sharing the object immmutables internals . Interning is just caching the whole objects.