why does this Java method leak—and why does inlini

2019-03-31 01:43发布

I wrote a minimal somewhat-lazy (int) sequence class, GarbageTest.java, as an experiment, to see if I could process very long, lazy sequences in Java, the way I can in Clojure.

Given a naturals() method that returns the lazy, infinite, sequence of natural numbers; a drop(n,sequence) method that drops the first n elements of sequence and returns the rest of the sequence; and an nth(n,sequence) method that returns simply: drop(n, lazySeq).head(), I wrote two tests:

static int N = (int)1e6;

// succeeds @ N = (int)1e8 with java -Xmx10m
@Test
public void dropTest() {
    assertThat( drop(N, naturals()).head(), is(N+1));
}

// fails with OutOfMemoryError @ N = (int)1e6 with java -Xmx10m
@Test
public void nthTest() {
    assertThat( nth(N, naturals()), is(N+1));
}

Note that the body of dropTest() was generated by copying the body of nthTest() and then invoking IntelliJ's "inline" refactoring on the nth(N, naturals()) call. So it seems to me that the behavior of dropTest() should be identical to the behavior of nthTest().

But it isn't identical! dropTest() runs to completion with N up to 1e8 whereas nthTest() fails with OutOfMemoryError for N as small as 1e6.

I've avoided inner classes. And I've experimented with a variant of my code, ClearingArgsGarbageTest.java, that nulls method parameters before calling other methods. I've applied the YourKit profiler. I've looked at the byte code. I just cannot find the leak that causes nthTest() to fail.

Where's the "leak"? And why does nthTest() have the leak while dropTest() does not?

Here's the rest of the code from GarbageTest.java in case you don't want to click through to the Github project:

/**
 * a not-perfectly-lazy lazy sequence of ints. see LazierGarbageTest for a lazier one
 */
static class LazyishSeq {
    final int head;

    volatile Supplier<LazyishSeq> tailThunk;
    LazyishSeq tailValue;

    LazyishSeq(final int head, final Supplier<LazyishSeq> tailThunk) {
        this.head = head;
        this.tailThunk = tailThunk;
        tailValue = null;
    }

    int head() {
        return head;
    }

    LazyishSeq tail() {
        if (null != tailThunk)
            synchronized(this) {
                if (null != tailThunk) {
                    tailValue = tailThunk.get();
                    tailThunk = null;
                }
            }
        return tailValue;
    }
}

static class Incrementing implements Supplier<LazyishSeq> {
    final int seed;
    private Incrementing(final int seed) { this.seed = seed;}

    public static LazyishSeq createSequence(final int n) {
        return new LazyishSeq( n, new Incrementing(n+1));
    }

    @Override
    public LazyishSeq get() {
        return createSequence(seed);
    }
}

static LazyishSeq naturals() {
    return Incrementing.createSequence(1);
}

static LazyishSeq drop(
        final int n,
        final LazyishSeq lazySeqArg) {
    LazyishSeq lazySeq = lazySeqArg;
    for( int i = n; i > 0 && null != lazySeq; i -= 1) {
        lazySeq = lazySeq.tail();
    }
    return lazySeq;
}

static int nth(final int n, final LazyishSeq lazySeq) {
    return drop(n, lazySeq).head();
}

2条回答
Anthone
2楼-- · 2019-03-31 02:22

In your method

static int nth(final int n, final LazyishSeq lazySeq) {
    return drop(n, lazySeq).head();
}

the parameter variable lazySeq hold a reference to the first element of your sequence during the entire drop operation. This prevents the entire sequence from getting garbage collected.

In contrast, with

public void dropTest() {
    assertThat( drop(N, naturals()).head(), is(N+1));
}

the first element of your sequence is returned by naturals() and directly passed to the invocation of drop, thus removed from the operand stack and does not exist during the execution of drop.

Your attempt to set the parameter variable to null, i.e.

static int nth(final int n, /*final*/ LazyishSeq lazySeqArg) {
    final LazyishSeq lazySeqLocal = lazySeqArg;
    lazySeqArg = null;
    return drop(n,lazySeqLocal).head();
}

does not help, as now, the lazySeqArg variable is null, but the lazySeqLocal holds a reference to the first element.

A local variable does not prevent garbage collection in general, the collection of otherwise unused objects is permitted, but that doesn’t imply that a particular implementation is capable of doing it.

In case of the HotSpot JVM, only optimized code will get rid of such unused references. But here, nth is not a hot spot, as the heavy things happen within drop method.

This is the reason why the same issue does not appear at the drop method, despite it also holds a reference to the first element in its parameter variable. The drop method contains the loop doing the actual work, hence, is very likely to get optimized by the JVM, which may cause it to eliminate unused variables, allowing the already processed part of the sequence to become collected.

There are many factors which may affect the JVM’s optimizations. Besides the different shape of the code, it seems that that rapid memory allocations during the unoptimized phase may also reduce the optimizer’s improvements. Indeed, when I run with -Xcompile, to forbid interpreted execution altogether, both variants run successfully, even int N = (int)1e9 is no problem anymore. Of course, forcing compilation raises the startup time.

I have to admit that I do not understand why the mixed mode performs that much worse and I’ll investigate further. But generally, you have to be aware that the efficiency of the garbage collector is implementation dependent, so objects collected in one environment may stay in memory in another.

查看更多
Rolldiameter
3楼-- · 2019-03-31 02:23

Clojure implements a strategy for dealing with this sort of scenario which it calls "locals clearing". There's support for it in the compiler that makes it kick in automatically where required in pure Clojure code (unless disabled at compilation time – this is sometimes useful for debugging). Clojure does also clear locals in various places in its Java runtime, however, and the way it does that could be used in Java libraries and possibly even application code, though it would undoubtedly be somewhat cumbersome.

Before I get into what Clojure does, here's a short summary of what is going on in this example:

  1. nth(int, LazyishSeq) is implemented in terms of drop(int, LazyishSeq) and LazyishSeq.head().

  2. nth passes both its arguments to drop and has no further use for them.

  3. drop can easily be implemented so as to avoid holding on to the head of the passed-in sequence.

Here nth still holds on to the head of its sequence argument. The runtime may potentially discard that reference, but it is not guaranteed that it will.

The way Clojure deals with this is by clearing the reference to the sequence explicitly before control is handed off to drop. This is done using a rather elegant trick (link to the below snippet on GitHub as of Clojure 1.9.0):

//  clojure/src/jvm/clojure/lang/Util.java

/**
 *   Copyright (c) Rich Hickey. All rights reserved.
 *   The use and distribution terms for this software are covered by the
 *   Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php)
 *   which can be found in the file epl-v10.html at the root of this distribution.
 *   By using this software in any fashion, you are agreeing to be bound by
 *   the terms of this license.
 *   You must not remove this notice, or any other, from this software.
 **/

// … beginning of the file omitted …

// the next line is the 190th in the file as of Clojure 1.9.0
static public Object ret1(Object ret, Object nil){
        return ret;
}

static public ISeq ret1(ISeq ret, Object nil){
        return ret;
}

// …

Given the above, the call to drop inside nth can be changed to

drop(n, ret1(lazySeq, lazySeq = null))

Here lazySeq = null is evaluated as an expression before control is transferred to ret1; the value is null and there is also the side effect of setting the lazySeq reference to null. The first argument to ret1 will have been evaluated by this point, however, so ret1 receives the reference to the sequence in its first argument and returns it as expected, and that value is then passed to drop.

Thus drop receives the original value held by the lazySeq local, but the local itself is cleared before control is transferred to drop.

Consequently nth no longer holds on to the head of the sequence.

查看更多
登录 后发表回答