Why does a lambda need to capture the enclosing in

2019-02-04 15:30发布

站内文章 / Java

44 0

疯言疯语

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

This is based on this question. Consider this example where a method returns a Consumer based on a lambda expression:

public class TestClass {
    public static void main(String[] args) {
        MyClass m = new MyClass();
        Consumer<String> fn = m.getConsumer();

        System.out.println("Just to put a breakpoint");
    }
}

class MyClass {
    final String foo = "foo";

    public Consumer<String> getConsumer() {
        return bar -> System.out.println(bar + foo);
    }
}

As we know, it's not a good practice to reference a current state inside a lambda when doing functional programming, one reason is that the lambda would capture the enclosing instance, which will not be garbage collected until the lambda itself is out of scope.

However, in this specific scenario related to final strings, it seems the compiler could have just enclosed the constant (final) string foo (from the constant pool) in the returned lambda, instead of enclosing the whole MyClass instance as shown below while debugging (placing the breaking at the System.out.println). Does it have to do with the way lambdas are compiled to a special invokedynamic bytecode?

回答1:

If the lambda was capturing foo instead of this, you could in some cases get a different result. Consider the following example:

public class TestClass {
    public static void main(String[] args) {
        MyClass m = new MyClass();
        m.consumer.accept("bar2");
    }
}

class MyClass {
    final String foo;
    final Consumer<String> consumer;

    public MyClass() {
        consumer = getConsumer();
        // first call to illustrate the value that would have been captured
        consumer.accept("bar1");
        foo = "foo";
    }

    public Consumer<String> getConsumer() {
        return bar -> System.out.println(bar + foo);
    }
}

Output:

bar1null
bar2foo

If foo was captured by the lambda, it would be captured as null and the second call would print bar2null. However since the MyClass instance is captured, it prints the correct value.

Of course this is ugly code and a bit contrived, but in more complex, real-life code, such an issue could somewhat easily occur.

Note that the only true ugly thing, is that we are forcing a read of the to-be-assigned foo in the constructor, through the consumer. Building the consumer itself is not expected to read foo at that time, so it is still legit to build it before assigning foo – as long as you don't use it immediately.

However the compiler will not let you initialize the same consumer in the constructor before assigning foo – probably for the best :-)

回答2:

In your code, bar + foo is really shorthand for bar + this.foo; we're just so used to the shorthand that we forget we are implicitly fetching an instance member. So your lambda is capturing this, not this.foo.

If your question is "could this feature have been implemented differently", the answer is "probably yes"; we could have made the specification/implementation of lambda capture arbitrarily more complicated in the aim of providing incrementally better performance for a variety of special cases, including this one.

Changing the specification so that we captured this.foo instead of this wouldn't change much in the way of performance; it would still be a capturing lambda, which is a much bigger cost consideration than the extra field dereference. So I don't see this as providing a real performance boost.

回答3:

You are right, it technically could do so, because the field in question is final, but it doesn't.

However, if it is a problem that the returned lambda retains the reference to the MyClass instance, then you can easily fix it yourself:

public Consumer<String> getConsumer() {
    String f = this.foo;
    return bar -> System.out.println(bar + f);
}

Note, that if the field hadn't been final, then your original code would use the actual value at the time the lambda is executed, while the code listed here would use the value as of the time the getConsumer() method is executed.

回答4:

Note that for any ordinary Java access to a variable being a compile-time constant, the constant value takes place, so, unlike some people claimed, it is immune to initialization order issues.

We can demonstrate this by the following example:

abstract class Base {
    Base() {
        // bad coding style don't do this in real code
        printValues();
    }
    void printValues() {
        System.out.println("var1 read: "+getVar1());
        System.out.println("var2 read: "+getVar2());
        System.out.println("var1 via lambda: "+supplier1().get());
        System.out.println("var2 via lambda: "+supplier2().get());
    }
    abstract String getVar1();
    abstract String getVar2();
    abstract Supplier<String> supplier1();
    abstract Supplier<String> supplier2();
}
public class ConstantInitialization extends Base {
    final String realConstant = "a constant";
    final String justFinalVar; { justFinalVar = "a final value"; }

    ConstantInitialization() {
        System.out.println("after initialization:");
        printValues();
    }
    @Override String getVar1() {
        return realConstant;
    }
    @Override String getVar2() {
        return justFinalVar;
    }
    @Override Supplier<String> supplier1() {
        return () -> realConstant;
    }
    @Override Supplier<String> supplier2() {
        return () -> justFinalVar;
    }
    public static void main(String[] args) {
        new ConstantInitialization();
    }
}

It prints:

var1 read: a constant
var2 read: null
var1 via lambda: a constant
var2 via lambda: null
after initialization:
var1 read: a constant
var2 read: a final value
var1 via lambda: a constant
var2 via lambda: a final value

So, as you can see, the fact that the write to the realConstant field did not happen yet when the super constructor is executed, no uninitialized value is seen for the true compile-time constant, even when accessing it via lambda expression. Technically, because the field isn’t actually read.

Also, nasty Reflection hacks have no effect on ordinary Java access to compile-time constants, for the same reason. The only way to read such a modified value back, is via Reflection:

public class TestCapture {
    static class MyClass {
        final String foo = "foo";
        private Consumer<String> getFn() {
          //final String localFoo = foo;
          return bar -> System.out.println("lambda: " + bar + foo);
        }
    }
    public static void main(String[] args) throws ReflectiveOperationException {
        final MyClass obj = new MyClass();
        Consumer<String> fn = obj.getFn();
        // change the final field obj.foo
        Field foo=obj.getClass().getDeclaredFields()[0];
        foo.setAccessible(true);
        foo.set(obj, "bar");
        // prove that our lambda expression doesn't read the modified foo
        fn.accept("");
        // show that it captured obj
        Field capturedThis=fn.getClass().getDeclaredFields()[0];
        capturedThis.setAccessible(true);
        System.out.println("captured obj: "+(obj==capturedThis.get(fn)));
        // and obj.foo contains "bar" when actually read
        System.out.println("via Reflection: "+foo.get(capturedThis.get(fn)));
        // but no ordinary Java access will actually read it
        System.out.println("ordinary field access: "+obj.foo);
    }
}

It prints:

lambda: foo
captured obj: true
via Reflection: bar
ordinary field access: foo

which shows us two things,

Reflection also has no effect on compile-time constants
The surrounding object has been captured, despite it won’t be used

I’d be happy to find an explanation like, “any access to an instance field requires the lambda expression to capture the instance of that field (even if the field is not actually read)”, but unfortunately I couldn’t find any statement regarding capturing of values or this in the current Java Language Specification, which is a bit frightening:

We got used to the fact that not accessing instance fields in a lambda expression will create an instance which doesn’t have a reference to this, but even that isn’t actually guaranteed by the current specification. It’s important that this omission gets fixed soon…