Based on the discussion about getting substring of String Java String.split memory leak? , I have been analyzing two sample substring examples of usage.
It is said that objects don't get garbage collected if the caller stores a substring of a field in the object. When I run the code I get and OutofMemory Exception, and see the incresing of char[] allocated size while monitoring it via VisualVM
public class TestGC {
private String largeString = new String(new byte[100000]);
String getString() {
return this.largeString.substring(0,2);
//return new String(this.largeString.substring(0,2));
}
public static void main(String[] args) {
java.util.ArrayList<String> list = new java.util.ArrayList<String>();
for (int i = 0; i < 100000; i++) {
TestGC gc = new TestGC();
list.add(gc.getString());
}
}
}
with the following code, I did not get an error and after analyzing memory usage via VisualVM I realized that allocated char[] size getting increasing then somehow decreased at some point , then increasing again and decreased at some point (GC works its job). And It continues forever.
public class TestGC {
private String largeString = new String(new byte[100000]);
String getString() {
//return this.largeString.substring(0,2);
return new String(this.largeString.substring(0,2));
}
public static void main(String[] args) {
java.util.ArrayList<String> list = new java.util.ArrayList<String>();
for (int i = 0; i < 100000; i++) {
TestGC gc = new TestGC();
list.add(gc.getString());
}
}
}
I really want to understand what does GC collect then remove from heap memory in second example? Why GC cannot collect same object in the first example?
at the first example largeString.substring(0,2));
send a reference and at the second example new String(this.largeString.substring(0,2));
creates new objects. Both cases should not problem for behaviour of GC?
My understanding from all answers and comments especially from David Wallace and DaveJohnston.
Here is the first example's references among objects representation
Here is the second example's references among objects representation
Resolve the memory leak in JDK 1.6
http://javaexplorer03.blogspot.in/2015/10/how-to-resolve-memory-leak-in-jdk-16.html
subString = string.substring(3, 10) + "";
In above code, the string.substring(3, 10) will return the substring which point to original string array and the substring will not allow the garbage collection for old string (char value[]).
But when we add the empty string to offset, new string will form in constant pool with new char value[] array and we can overcome the problem of garbage collection of old string array.
In the first example, every time around the loop when you create a new TestGC object you are also creating a new String initialised from the 100000 byte array. When you call String.substring you are returning the same big long string but with the offset set to 0 and count set to 2. So all the data is still in memory but when you use the String you will only see the 2 characters specified in the substring call.
In the second example you are again creating the new String every time around the loop, but by calling new String(String.substring) you are discarding the rest of the String and only keeping the 2 characters in memory, so the rest can be garbage collected.
As the links in the comments say, this behaviour has changed in 1.7.0_06 so that the String returned by String.substring will no longer share the same char[].
I wouldn't expect the behaviour that you've described in Java 7, because substrings are now handled completely differently. However ...
In Java 6
In the first example, the substring that you're storing in your list uses the same character array as the original String inside the TestGC object, so that character array can't get returned to the heap.
In the second example, a new String is allocated with its own character array when you do the copy, so the original String can be returned to the heap when the TestGC goes out of scope. So you don't get 100000 bytes leaking on every iteration through the loop.
The
new String()
explicit constructor call creates a new String instance with a copy of the relevant part of thechar[]
(as opposed to the first example where the underlying hugechar[]
is shared). So, in your second example, the huge String gets allocated in each loop, but discarded after the TestGC instance is discarded at the end of the loop.