I'm trying to find a third solution to this question.
I can't understand why this doesn't print false
.
public class MyClass {
public MyClass() {
try {
Field f = String.class.getDeclaredField("value");
f.setAccessible(true);
f.set("true", f.get("false"));
} catch (Exception e) {
}
}
public static void main(String[] args) {
MyClass m = new MyClass();
System.out.println(m.equals(m));
}
}
Surely, because of string interning, the "true"
instance being modified is exactly the same one used in the print
method of PrintStream
?
public void print(boolean b) {
write(b ? "true" : "false");
}
What am I missing?
Edit
An interesting point by @yshavit is that if you add the line
System.out.println(true);
before the try
, the output is
true
false
This is arguably a HotSpot JVM bug.
The problem is in the string literal interning mechanism.
java.lang.String
instances for the string literals are created lazily during constant pool resolution.
- Initially a string literal is represented in the constant pool by
CONSTANT_String_info
structure that points to CONSTANT_Utf8_info
.
- Each class has its own constant pool. That is,
MyClass
and PrintStream
have their own pair of CONSTANT_String_info
/ CONSTANT_Utf8_info
cpool entries for the literal 'true'.
- When
CONSTANT_String_info
is accessed for the first time, JVM initiates the process of resolution. String interning is the part of this process.
- To find a match for a literal being interned, JVM compares the contents of
CONSTANT_Utf8_info
with the contents of string instances in the StringTable
.
- ^^^ And here is the problem. Raw UTF data from cpool is compared to Java
char[]
array contents that can be spoofed by a user via Reflection.
So, what's happening in your test?
f.set("true", f.get("false"))
initiates the resolution of the literal 'true' in MyClass
.
- JVM discovers no instances in
StringTable
matching the sequence 'true', and creates a new java.lang.String
, which is stored in StringTable
.
value
of that String from StringTable
is replaced via Reflection.
System.out.println(true)
initiates the resolution of the literal 'true' in PrintStream
class.
- JVM compares UTF sequence 'true' with Strings from
StringTable
, but finds no match, since that String already has 'false' value. Another String for 'true' is created and placed in StringTable
.
Why do I think this is a bug?
JLS §3.10.5 and JVMS §5.1 require that string literals containing the same sequence of characters must point to the same instance of java.lang.String
.
However, in the following code the resolution of two string literals with the same sequence of characters result in different instances.
public class Test {
static class Inner {
static String trueLiteral = "true";
}
public static void main(String[] args) throws Exception {
Field f = String.class.getDeclaredField("value");
f.setAccessible(true);
f.set("true", f.get("false"));
if ("true" == Inner.trueLiteral) {
System.out.println("OK");
} else {
System.out.println("BUG!");
}
}
}
A possible fix for JVM is to store a pointer to original UTF sequence in StringTable
along with java.lang.String
object, so that interning process will not compare cpool data (inaccessible by user) with value
arrays (accessible via Reflection).
I've written this as a community wiki as I don't know if it's right and don't understand the details anyway.
What appears to happen is that when a string literal is encountered at runtime, the JVM checks the string pool (using equals
) to see if the string is already there. If it isn't there, a new instance is used. This object (either the new one or the one that was already in the string pool) is the one that will be used from now on for all string literals in that class that are the same.
Consider this example:
public class MyClass {
public MyClass() {
try {
Field f = String.class.getDeclaredField("value");
f.setAccessible(true);
f.set("true", f.get("false"));
} catch (Exception e) {
}
}
public static void main(String[] args) {
System.out.println(true); // 1
new MyClass();
System.out.println(true); // 2
System.out.println("true"); // 3
printTrue();
OtherClass.printTrue();
}
public static void printTrue() {
System.out.println("true"); // 4
}
}
public class OtherClass {
static void printTrue() {
System.out.println("true"); // 5
}
}
This prints:
true
false
false
false
true
My explanation:
In line 1, the JVM encounters the literal "true"
in the PrintStream
class. A new string is added to the pool. Then new MyClass()
is invoked. Inside this constructor, the JVM encounters the string literal "true"
in the MyClass
class. This string is already in the pool, so the instance in the pool is the one that will be used, but crucially it is also the one that will later be used in lines 3 and 4. Then the array backing this string is modified. Lines 2, 3 and 4 therefore all print false
. Next, OtherClass.printTrue()
is invoked and the JVM encounters the string literal "true"
for the first time in OtherClass
. This string is not equal
to the one in the pool because the one in the pool now has backing array [f, a, l, s, e]
. Therefore a new string instance is used and true
is printed at line 5.
Now suppose we comment out line 1:
// System.out.println(true); // 1
This time the output is:
true
false
false
true
Why does line 2 produce a different result? The difference here is the literal "true"
is not encountered in the PrintStream
class until after the backing array has been modified. So the "wrong" string is not the one used in the PrintStream
class. However, lines 3 and 4 continue to print "false"
for the same reason as above.