What is String Interning in Java, when I should use it, and why?
相关问题
- Delete Messages from a Topic in Apache Kafka
- Jackson Deserialization not calling deserialize on
- How to maintain order of key-value in DataFrame sa
- StackExchange API - Deserialize Date in JSON Respo
- Difference between Types.INTEGER and Types.NULL in
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#intern()
Basically doing String.intern() on a series of strings will ensure that all strings having same contents share same memory. So if you have list of names where 'john' appears 1000 times, by interning you ensure only one 'john' is actually allocated memory.
This can be useful to reduce memory requirements of your program. But be aware that the cache is maintained by JVM in permanent memory pool which is usually limited in size compared to heap so you should not use intern if you don't have too many duplicate values.
More on memory constraints of using intern()
-- From: http://www.codeinstructions.com/2009/01/busting-javalangstringintern-myths.html
From JDK 7 (I mean in HotSpot), something has changed.
-- From Java SE 7 Features and Enhancements
Update: Interned strings are stored in main heap from Java 7 onwards. http://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html#jdk7changes
JLS
JLS 7 3.10.5 defines it and gives a practical example:
JVMS
JVMS 7 5.1 says says that interning is implemented magically and efficiently with a dedicated
CONSTANT_String_info
struct (unlike most other objects which have more generic representations):Bytecode
Let's decompile some OpenJDK 7 bytecode to see interning in action.
If we decompile:
we have on the constant pool:
and
main
:Note how:
0
and3
: the sameldc #2
constant is loaded (the literals)12
: a new string instance is created (with#2
as argument)35
:a
andc
are compared as regular objects withif_acmpne
The representation of constant strings is quite magic on the bytecode:
new String
)and the JVMS quote above seems to say that whenever the Utf8 pointed to is the same, then identical instances are loaded by
ldc
.I have done similar tests for fields, and:
static final String s = "abc"
points to the constant table through the ConstantValue Attributeldc
Conclusion: there is direct bytecode support for the string pool, and the memory representation is efficient.
Bonus: compare that to the Integer pool, which does not have direct bytecode support (i.e. no
CONSTANT_String_info
analogue).There are some "catchy interview" questions why You get
If You should compare the Strings You should use
equals()
. The above will print equals, because thetestString
is allready interned for You by the compiler. You can intern the strings yourself using intern method as is shown in previous answers....Update for Java 8 or plus. In Java 8, PermGen (Permanent Generation) space is removed and replaced by Meta Space. The String pool memory is moved to the heap of JVM.
Compared with Java 7, the String pool size is increased in the heap. Therefore, you have more space for internalized Strings, but you have less memory for the whole application.
One more thing, you have already known that when comparing 2 (referrences of) objects in Java, '
==
' is used for comparing the reference of object, 'equals
' is used for comparing the contents of object.Let's check this code:
Result:
value1 == value2
---> truevalue1 == value3
---> falsevalue1.equals(value3)
---> truevalue1 == value3.intern()
---> trueThat's why you should use '
equals
' to compare 2 String objects. And that's is howintern()
is useful.String interning is an optimization technique by the compiler. If you have two identical string literals in one compilation unit then the code generated ensures that there is only one string object created for all the instance of that literal(characters enclosed in double quotes) within the assembly.
I am from C# background, so i can explain by giving a example from that:
output of the following comparisons:
Note1:Objects are compared by reference.
Note2:typeof(int).Name is evaluated by reflection method so it does not gets evaluated at compile time. Here these comparisons are made at compile time.
Analysis of the Results: 1) true because they both contain same literal and so the code generated will have only one object referencing "Int32". See Note 1.
2) true because the content of both the value is checked which is same.
3) FALSE because str2 and obj does not have the same literal. See Note 2.