Java “for” statement implementation prevents garba

2020-02-10 03:21发布

问题:

UPD 21.11.2017: the bug is fixed in JDK, see comment from Vicente Romero

Summary:

If for statement is used for any Iterable implementation the collection will remain in the heap memory till the end of current scope (method, statement body) and won't be garbage collected even if you don't have any other references to the collection and the application needs to allocate a new memory.

http://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8175883

https://bugs.openjdk.java.net/browse/JDK-8175883

The example:

If i have the next code, which allocates a list of large strings with random content:

import java.util.ArrayList;
public class IteratorAndGc {

    // number of strings and the size of every string
    static final int N = 7500;

    public static void main(String[] args) {
        System.gc();

        gcInMethod();

        System.gc();
        showMemoryUsage("GC after the method body");

        ArrayList<String> strings2 = generateLargeStringsArray(N);
        showMemoryUsage("Third allocation outside the method is always successful");
    }

    // main testable method
    public static void gcInMethod() {

        showMemoryUsage("Before first memory allocating");
        ArrayList<String> strings = generateLargeStringsArray(N);
        showMemoryUsage("After first memory allocation");


        // this is only one difference - after the iterator created, memory won't be collected till end of this function
        for (String string : strings);
        showMemoryUsage("After iteration");

        strings = null; // discard the reference to the array

        // one says this doesn't guarantee garbage collection,
        // Oracle says "the Java Virtual Machine has made a best effort to reclaim space from all discarded objects".
        // but no matter - the program behavior remains the same with or without this line. You may skip it and test.
        System.gc();

        showMemoryUsage("After force GC in the method body");

        try {
            System.out.println("Try to allocate memory in the method body again:");
            ArrayList<String> strings2 = generateLargeStringsArray(N);
            showMemoryUsage("After secondary memory allocation");
        } catch (OutOfMemoryError e) {
            showMemoryUsage("!!!! Out of memory error !!!!");
            System.out.println();
        }
    }

    // function to allocate and return a reference to a lot of memory
    private static ArrayList<String> generateLargeStringsArray(int N) {
        ArrayList<String> strings = new ArrayList<>(N);
        for (int i = 0; i < N; i++) {
            StringBuilder sb = new StringBuilder(N);
            for (int j = 0; j < N; j++) {
                sb.append((char)Math.round(Math.random() * 0xFFFF));
            }
            strings.add(sb.toString());
        }

        return strings;
    }

    // helper method to display current memory status
    public static void showMemoryUsage(String action) {
        long free = Runtime.getRuntime().freeMemory();
        long total = Runtime.getRuntime().totalMemory();
        long max = Runtime.getRuntime().maxMemory();
        long used = total - free;
        System.out.printf("\t%40s: %10dk of max %10dk%n", action, used / 1024, max / 1024);
    }
}

compile and run it with limited memory, like this (180mb):

javac IteratorAndGc.java   &&   java -Xms180m -Xmx180m IteratorAndGc

and at runtime i have:

Before first memory allocating: 1251k of max 176640k

After first memory allocation: 131426k of max 176640k

After iteration: 131426k of max 176640k

After force GC in the method body: 110682k of max 176640k (almost nothing collected)

Try to allocate memory in the method body again:

     !!!! Out of memory error !!!!:     168948k of max     176640k

GC after the method body: 459k of max 176640k (the garbage is collected!)

Third allocation outside the method is always successful: 117740k of max 163840k

So, inside gcInMethod() i tried to allocate the list, iterate over it, discard the reference to the list, (optional)force garbage collection and allocate similar list again. But i can't allocate second array because of lack of memory.

In the same time, outside the function body i can successfully force garbage collection (optional) and allocate the same array size again!

To avoid this OutOfMemoryError inside the function body it's enough to remove/comment only this one line:

for (String string : strings); <-- this is the evil!!!

and then output looks like this:

Before first memory allocating: 1251k of max 176640k

After first memory allocation: 131409k of max 176640k

After iteration: 131409k of max 176640k

After force GC in the method body: 497k of max 176640k (the garbage is collected!)

Try to allocate memory in the method body again:

After secondary memory allocation: 115541k of max 163840k

GC after the method body: 493k of max 163840k (the garbage is collected!)

Third allocation outside the method is always successful: 121300k of max 163840k

So, without for iterating the garbage successfully collected after discarding the reference to the strings, and allocated second time (inside the function body) and allocated third time (outside the method).

My supposition:

for syntax construction is compiled to

Iterator iter = strings.iterator();
while(iter.hasNext()){
    iter.next()
}

(and i checked this decompiling javap -c IteratorAndGc.class)

And looks like this iter reference stays in the scope till the end. You don't have access to the reference to nullify it, and GC can't perform the collection.

Maybe this is normal behavior (maybe even specified in javac, but i haven't found), but IMHO if compiler creates some instances it should care about discarding them from the scope after using.

That's how i expect to have the implementation of for statement:

Iterator iter = strings.iterator();
while(iter.hasNext()){
    iter.next()
}
iter = null; // <--- flush the water!

Used java compiler and runtime versions:

javac 1.8.0_111

java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)

Note:

  • the question is not about programming style, best practices, conventions and so on, the question is about an efficiency of Java platform.

  • the question is not about System.gc() behavior (you may remove all gc calls from the example) - during the second strings allocation the JVM must release the dicarded memory.

Reference to the test java class, Online compiler to test (but this resource has only 50 Mb of heap, so use N = 5000)

回答1:

Thanks for the bug report. We have fixed this bug, see JDK-8175883. As commented here in the case of the enhanced for, javac was generating synthetic variables so for a code like:

void foo(String[] data) {
    for (String s : data);
}

javac was approximately generating:

for (String[] arr$ = data, len$ = arr$.length, i$ = 0; i$ < len$; ++i$) {
    String s = arr$[i$];
}

as mentioned above this translation approach implies that the synthetic variable arr$ holds a reference to the array data that impedes the GC to collect the array once it is not referred anymore inside the method. This bug has been fixed by generating this code:

String[] arr$ = data;
String s;
for (int len$ = arr$.length, i$ = 0; i$ < len$; ++i$) {
    s = arr$[i$];
}
arr$ = null;
s = null;

The idea is to set to null any synthetic variable of a reference type created by javac to translate the loop. If we were talking about an array of a primitive type, then the last assignment to null is not generated by the compiler. The bug has been fixed in repo JDK repo



回答2:

The only relevant part of the enhanced for statement, here, is the extra local reference to the object.

Your example can be reduced to

public class Example {
    private static final int length = (int) (Runtime.getRuntime().maxMemory() * 0.8);

    public static void main(String[] args) {
        byte[] data = new byte[length];
        Object ref = data; // this is the effect of your "foreach loop"
        data = null;
        // ref = null; // uncommenting this also makes this complete successfully
        byte[] data2 = new byte[length];
    }
}

This program will also fail with an OutOfMemoryError. If you remove the ref declaration (and its initialization), it will complete successfully.

The first thing you need to understand is that scope has nothing to do with garbage collection. Scope is a compile time concept that defines where identifiers and names in a program's source code can be used to refer to program entities.

Garbage collection is driven by reachability. If the JVM can determine that an object cannot be accessed by any potential continuing computation from any live thread, then it will consider it eligible for garbage collection. Also, the System.gc() is useless because the JVM will perform a major collection if it cannot find space to allocate a new object.

So the question becomes: why can't the JVM determine that the byte[] object is no longer accessed if we store it in a second local variable?

I don't have an answer for that. Different garbage collection algorithms (and JVMs) may behave differently in that regard. It seems that this JVM doesn't mark the object as unreachable when a second entry in the local variable table has a reference to that object.


Here's a different scenario where the JVM didn't behave exactly as you migth have expected in regards to garbage collection:

  • OutOfMemoryError when seemingly unrelated code block commented out


回答3:

So this is actually an interesting question that could have benefited from a slightly different wording. More specifically, focusing on the generated bytecode instead would have cleared a lot of the confusion. So let's do that.

Given this code:

List<Integer> foo = new ArrayList<>();
for (Integer i : foo) {
  // nothing
}

This is the generated bytecode:

   0: new           #2                  // class java/util/ArrayList
   3: dup           
   4: invokespecial #3                  // Method java/util/ArrayList."<init>":()V
   7: astore_1      
   8: aload_1       
   9: invokeinterface #4,  1            // InterfaceMethod java/util/List.iterator:()Ljava/util/Iterator;
  14: astore_2      
  15: aload_2       
  16: invokeinterface #5,  1            // InterfaceMethod java/util/Iterator.hasNext:()Z
  21: ifeq          37
  24: aload_2       
  25: invokeinterface #6,  1            // InterfaceMethod java/util/Iterator.next:()Ljava/lang/Object;
  30: checkcast     #7                  // class java/lang/Integer
  33: astore_3      
  34: goto          15

So, play by play:

  • Store the new list in local variable 1 ("foo")
  • Store the iterator in local variable 2
  • For each element, store the element in local variable 3

Note that after the loop, there's no cleanup of anything that was used in the loop. That isn't restricted to the iterator: the last element is still stored in local variable 3 after the loop ends, even though there's no reference to it in the code.

So before you go "that's wrong, wrong, wrong", let's see what happens when I add this code after that code above:

byte[] bar = new byte[0];

You get this bytecode after the loop:

  37: iconst_0      
  38: newarray       byte
  40: astore_2      

Oh, look at that. The newly declared local variable is being stored in the same "local variable" as the iterator. So now the reference to the iterator is gone.

Note that this is different from the Java code you assume is the equivalent. The actual Java equivalent, which generates the exact same bytecode, is this:

List<Integer> foo = new ArrayList<>();
for (Iterator<Integer> i = foo.iterator(); i.hasNext(); ) {
  Integer val = i.next();
}

And still there's no cleanup. Why's that?

Well, here we are in guessing territory, unless it's actually specified in the JVM spec (haven't checked). Anyway, to do cleanup, the compiler would have to generate extra bytecode (2 instructions, aconst_null and astore_<n>) for each variable that's going out of scope. This would mean the code runs slower; and to avoid that, possibly complicated optimizations would have to be added to the JIT.

So, why does your code fail?

You end up in a similar situation as the above. The iterator is allocated and stored in local variable 1. Then your code tries to allocate the new string array and, because local variable 1 is not in use anymore, it would be stored in the same local variable (check the bytecode). But the allocation happens before the assignment, so there's a reference to the iterator still, so there's no memory.

If you add this line before the try block, things work, even if you remove the System.gc() call:

int i = 0;

So, it seems the JVM developers made a choice (generate smaller / more efficient bytecode instead of explicitly nulling variables that go out of scope), and you happen to have written code that doesn't behave well under the assumptions they made about how people write code. Given that I've never seen this problem in actual applications, seems like a minor thing to me.



回答4:

As already stated in the other answers, there the concept of variable scopes is not known at runtime. In the compiled class files, local variables are only places within a stack frame (addressed by an index), to which writes and reads are performed. If multiple variable have disjunct scopes, they may use the same index, but there is no formal declaration of them. Only the write of a new value discards the old one.

So, there are three ways, how a reference held in a local variable storage can be considered unused:

  1. The storage location is overwritten by a new value
  2. The method exits
  3. No subsequent code reads the value

It should be obvious that the third point is the hardest to check, hence, it does not always apply, but when the optimizer starts its work, it may lead to surprises in the other direction, as explained in “Can java finalize an object when it is still in scope?” and “finalize() called on strongly reachable object in Java 8”.

In your case, the application runs very shortly and likely non-optimized, which can lead to references being not recognized as being unused due to point 3, when point 1 and 2 do not apply.

You can easily verify that this is the case. When you change the line

ArrayList<String> strings2 = generateLargeStringsArray(N);

to

ArrayList<String> strings2 = null;
strings2 = generateLargeStringsArray(N);

the OutOfMemoryError goes away. The reason is that the storage location holding the Iterator used in the preceding for loop has not been overwritten at this point. The new local variable strings2 will reuse the storage, but this only manifests when a new value is actually written to it. So the initialization with null before calling generateLargeStringsArray(N) will overwrite the Iterator reference and allows the old list to be collected.

Alternatively, you can run the program in the original form using the option -Xcomp. This forces compilation of all methods. On my machine, it had a noticeable startup slowdown, but due to the variable usage analysis, the OutOfMemoryError also went away.

Having an application that allocates that much memory (compared to the max heap size) during the initialization, i.e. when most methods run interpreted, is an unusual corner case. Usually, most hot methods are sufficiently compiled before the memory consumption is that high. If you encounter this corner case repeatedly in a real life application, well, then -Xcomp might work for you.



回答5:

Finally, Oracle/Open JKD bug is accepted, approved and fixed:

http://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8175883

https://bugs.openjdk.java.net/browse/JDK-8175883

Quoting the comments from the threads:

This is an issue reproducible both on 8 and 9

There is some issue the program keeps it's own implicit auto-generated reference to the memory block till next implicit usage and its memory is being locked that causing OOM

(this proves @vanza's expectation, see this example from the JDK developer)

According the spec this should not happen

(this is an answer to my question: if compiler creates some instances it should care about discarding them from the scope after using)

UPD 21.11.2017: the bug is fixed in JDK, see comment from Vicente Romero



回答6:

Just to summarize the answers:

As @sotirios-delimanolis mentioned in his comment about The enhanced for statement - my assumption is explicitly defined: the for sugar statement is compiled to Iterator with hasNext()-next() calls:

#i is an automatically generated identifier that is distinct from any other identifiers (automatically generated or otherwise) that are in scope (§6.3) at the point where the enhanced for statement occurs.

As then @vanza showed in his answer: this automatically generated identifier might be or might be not overridden later. If it is overridden - the memory may be released, if not - the memory is not released any more.

Still (for me) is open question: if Java compiler or JVM creates some implicit references, shouldn't it care then later about discarding those references? Is there any guarantee that the same auto-generated iterator reference will be reused in the next calls before the next memory allocation? Shouldn't it be a rule: those who allocate memory then care about releasing it? I'd say - it must care about this. Otherwise the behavior is undefined (it may fall to OutOfMemoryError, or may not - who knows...)

Yes, my example is a corner case (nothing initialized between for iterator and the next memory allocation), but this doesn't mean it is impossible case. And this doesn't mean this case is hard to achieve - its quite probable to work in a limited memory environment with some large data and re-allocate the memory immediately it has been used. I found this case in my working application where I parse a large XML, which "eats" more than a half of memory.

(and the question is not only about iterator and for loops, guess it is common issue: the compiler or JVM sometimes doesn't cleanup own implicit references).