-->

Discrete Anonymous methods sharing a class?

2019-01-09 11:13发布

问题:

I was playing a bit with Eric Lippert's Ref<T> class from here. I noticed in the IL that it looked like both anonymous methods were using the same generated class, even though that meant the class had an extra variable.

While using only one new class definition seems somewhat reasonable, it strikes me as very odd that only one instance of <>c__DisplayClass2 is created. This seems to imply that both instances of Ref<T> are referencing the same <>c__DisplayClass2 Doesn't that mean that y cannot be collected until vart1 is collected, which may happen much later than after joik returns? After all, there is no guarantee that some idiot won't write a function (directly in IL) which directly accesses y through vart1 aftrer joik returns. Maybe this could even be done with reflection instead of via crazy IL.

sealed class Ref<T>
{
    public delegate T Func<T>();
    private readonly Func<T> getter;
    public Ref(Func<T> getter)
    {
        this.getter = getter;
    }
    public T Value { get { return getter(); } }
}

static Ref<int> joik()
{
    int[] y = new int[50000];
    int x = 5;
    Ref<int> vart1 = new Ref<int>(delegate() { return x; });
    Ref<int[]> vart2 = new Ref<int[]>(delegate() { return y; });
    return vart1;
}

Running IL DASM confirmed that vart1 and vart2 both used <>__DisplayClass2, which contained a public field for x and for y. The IL of joik:

.method private hidebysig static class Program/Ref`1<int32> 
        joik() cil managed
{
  // Code size       72 (0x48)
  .maxstack  3
  .locals init ([0] class Program/Ref`1<int32> vart1,
           [1] class Program/Ref`1<int32[]> vart2,
           [2] class Program/'<>c__DisplayClass2' '<>8__locals3',
           [3] class Program/Ref`1<int32> CS$1$0000)
  IL_0000:  newobj     instance void Program/'<>c__DisplayClass2'::.ctor()
  IL_0005:  stloc.2
  IL_0006:  nop
  IL_0007:  ldloc.2
  IL_0008:  ldc.i4     0xc350
  IL_000d:  newarr     [mscorlib]System.Int32
  IL_0012:  stfld      int32[] Program/'<>c__DisplayClass2'::y
  IL_0017:  ldloc.2
  IL_0018:  ldc.i4.5
  IL_0019:  stfld      int32 Program/'<>c__DisplayClass2'::x
  IL_001e:  ldloc.2
  IL_001f:  ldftn      instance int32 Program/'<>c__DisplayClass2'::'<joik>b__0'()
  IL_0025:  newobj     instance void class Program/Ref`1/Func`1<int32,int32>::.ctor(object,
                                                                                    native int)
  IL_002a:  newobj     instance void class Program/Ref`1<int32>::.ctor(class Program/Ref`1/Func`1<!0,!0>)
  IL_002f:  stloc.0
  IL_0030:  ldloc.2
  IL_0031:  ldftn      instance int32[] Program/'<>c__DisplayClass2'::'<joik>b__1'()
  IL_0037:  newobj     instance void class Program/Ref`1/Func`1<int32[],int32[]>::.ctor(object,
                                                                                        native int)
  IL_003c:  newobj     instance void class Program/Ref`1<int32[]>::.ctor(class Program/Ref`1/Func`1<!0,!0>)
  IL_0041:  stloc.1
  IL_0042:  ldloc.0
  IL_0043:  stloc.3
  IL_0044:  br.s       IL_0046
  IL_0046:  ldloc.3
  IL_0047:  ret
} // end of method Program::joik

回答1:

Yes, the MS implementation of anonymous methods effectively creates one hidden class per level of scope that it needs to capture variables from, and captures all the relevant variables from that scope. I believe this is done for the sake of simplicity, but it can indeed increase the lifetime of some objects unnecessarily.

It would be more elegant for each anonymous method to only capture the variables it was actually interested in. However, this could make life considerably more complicated... if one anonymous method captured x and y, one captured x and one captured y, you'd need three classes: one for capturing x, one for capturing y, and one for composing the two (but not just having two variables). The tricky bit is that for any single variable instantiation, that variable needs to live in exactly one place so that everything which refers to it sees the same value, whatever changes it.

This doesn't violate the spec in any way, but it could be considered unfortunate - I don't know whether it's actually bitten people in real life, but it's certainly possible.

The good news is that if the C# team decide to improve this, they should be able to do so in an entirely backwardly compatible way, unless some muppets are relying on lifetimes being extended unnecessarily.



回答2:

Jon is of course right. The problem that this usually causes is:

void M()
{
    Expensive e = GetExpensive();
    Cheap c = GetCheap();
    D longLife = ()=>...c...;
    D shortLife = ()=>...e...;
    ...
}

So we have an expensive resource whose lifetime now depends on the lifetime of longLife, even though shortLife is collected early.

This is unfortunate, but common. The implementations of closures in JScript and VB have the same problem.

I'd like to solve it in a hypothetical future version of C# but I make no guarantees. The obvious way to do it is to identify equivalence classes of closed-over variables based on which lambdas they are captured by, and generate closure classes one per equivalence class, rather than a single closure class.

There also might be things we could do with analysis of what closed-over variables are written to. As Jon notes, we are restricted at present by our need to capture variables rather than values. We could be more flexible in our code generation strategy if we identified variables that are never written to after the closure is created, and make those into closed-over values rather than closed-over variables.