java string concatenation and interning

2019-01-19 22:23发布

问题:

Question 1

String a1 = "I Love" + " Java";
String a2 = "I Love " + "Java";
System.out.println( a1 == a2 ); // true

String b1 = "I Love";
b1 += " Java";
String b2 = "I Love ";
b2 += "Java";
System.out.println( b1 == b2 ); // false

In the first case, I understand that it is a concatenation of two string literals, so the result "I Love Java" will be interned, giving the result true. However, I'm not sure about the second case.

Question 2

String a1 = "I Love" + " Java"; // line 1
String a2 = "I Love " + "Java"; // line 2

String b1 = "I Love";
b1 += " Java";
String b2 = "I Love ";
b2 += "Java";
String b3 = b1.intern();
System.out.println( b1 == b3 ); // false

The above returns false, but if I comment out lines 1 and 2, it returns true. Why is that?

回答1:

The first part of your question is simple: Java compiler treats concatenation of multiple string literals as a single string literal, i.e.

"I Love" + " Java"

and

"I Love Java"

are two identical string literals, which get properly interned.

The same interning behavior does not apply to += operation on strings, so b1 and b2 are actually constructed at run-time.

The second part is trickier. Recall that b1.intern() may return b1 or some other String object that is equal to it. When you keep a1 and a2, you get a1 back from the call to b1.intern(). When you comment out a1 and a2, there is no existing copy to be returned, so b1.intern() gives you back b1 itself.



回答2:

From intern() docs

All literal strings and string-valued constant expressions are interned. String literals are defined in section 3.10.5 of the The Java™ Language Specification.

And from JLS 3.10.5

  • Strings computed by constant expressions (§15.28) are computed at compile time and then treated as if they were literals.
    • Strings computed by concatenation at run time are newly created and therefore distinct.

Your string b1 not actually interned. Hence the difference.



回答3:

Answer for question 1:

You cannot compare two String with ==. The == compares two primitive datatypes (int, long, float, double and boolean) or object references. Which means if the reference varibales (a1, a2, b1, b2) don't have the same reference (meaning they don't point to the same object in memory) the are not equal (comparison with ==).

If you would compare with b1.equals(b2), the expression would be true since the data of the object is the same.

In the first case, Java is smart enough to concatenate the Strings before allocating them some memory (even before compilation), which means both Strings are stored at the same address. Hence the variables a1 and a2 reference the same object and are equal (==).

In the second case you first assign the variables a different value (unlike the first case). This means they get a separate address in the memory. Even if you change the value so they are the same, the address doesn't change and a comparison with == evaluates to false. This happens during runtime.

As for question 2: @dasblinkenlight already gave a good answer to that.