If you have two instances of a String, and they are equal, in Java they will share the same memory. How is this implemented under the hood?
EDIT: My application uses a large number of String objects, many of which are identical. What is the best way to make use of Java String constant pool, as to avoid creating custom flyweight implementation?
To answer your edited question, Sun JVMs have a
-XX:+StringCache
option, which in my observation can reduce the memory footprint of a String heavy application significantly.Otherwise, you have the option of interning your Strings, but I would be careful about that. Strings that are very large and no longer referenced will still use memory for the life of the JVM.
Edit (in response to comment): I first found out about the StringCache option from here:
Tom Hawtin describes some type of caching to improve some benchmarks. My observation when I put it on IDEA was that the memory footprint (after a full garbage collection) went way down over not having it. It is not a documented parameter, and may indeed just be about optimizing for some benchmarks. My observation is that it helped, but I wouldn't build an important system based on it.
String literals are interned in Java, so there's really only one String object with multiple references (when they are equal, which is not always the case). See the java.net article All about intern() for more details.
There's also a good example/explanation in section 3.10.5 String Literals of the JLS that talks about when Strings are interned and when they'll be distinct.
Look at the source code of
java.lang.String
(the source for entire java api is part of the JDK).To summarize: A String wraps a subsequence of a
char[]
. That backingchar[]
is never modified. This is accomplished by neither leaking nor capturing thischar[]
outside theString
class. However, severalStrings
can share the samechar[]
(see Implementation ofString.substring
).There is also the mechanism of interning, as explained in the other answers.
This is actually not 100% true.
This blog post is a decent explanation of why this is so, and what the String constant pool is.
Two things to be careful about:
new String("abc")
constructor, just use the literal"abc"
.intern()
returns always strings that are pooled.That's not necessary true. Example:
but:
Now the second form is discouraged. Some (including me) think that
String
shouldn't even have a public constructor. A better version of the above would be:Obviously you don't need to do this for a constant
String
. It's illustrative.The important point about this is that if you're passed a
String
or get one from a function you can't rely on theString
being canonical. A canonicalObject
satisfies this equality:for non-
null
instancesa
,b,
of a givenClass
.