Weird behaviour of c# compiler due caching delegat

2020-07-08 06:42发布

问题:

Suppose I have following program:

static void SomeMethod(Func<int, int> otherMethod)
{
    otherMethod(1);
}

static int OtherMethod(int x)
{
    return x;
}

static void Main(string[] args)
{
    SomeMethod(OtherMethod);
    SomeMethod(x => OtherMethod(x));
    SomeMethod(x => OtherMethod(x));
}

I cannot understand compiled il code (it uses too extra code). Here is simplified version:

class C
{
    public static C c;
    public static Func<int, int> foo;
    public static Func<int, int> foo1;
    static C()
    {
        c = new C();
    }
    C(){}
    public int b(int x)
    {
        return OtherMethod(x);
    }
    public int b1(int x)
    {
        return OtherMethod(x);
    }
}

static void Main()
{
    SomeMethod(new Func<int, int>(OtherMethod));
    if (C.foo != null)
        SomeMethod(C.foo)
    else
    {
        C.foo = new Func<int, int>(c, C.b)
        SomeMethod(C.foo);
    }
    if (C.foo1 != null)
        SomeMethod(C.foo1)
    else
    {
        C.foo1 = new Func<int, int>(c, C.b1)
        SomeMethod(C.foo1);
    }
}

Why does compiler create not static equal methods b/b1? Equal means that they have the same code

回答1:

Your question is: why did the compiler not realize that the two lines

SomeMethod(x => OtherMethod(x));
SomeMethod(x => OtherMethod(x));

Are the same and write this as

if ( delegate is not created ) 
  create the delegate and stash it away
SomeMethod( the delegate );
SomeMethod( the delegate );

? Well let me answer that question in several ways.

First off, is the compiler permitted to make that optimization? Yes. The specification calls out that a C# compiler is permitted to make two lambdas that do exactly the same thing into a single delegate. And in fact you can see that it already does this optimization in part: it creates each delegate once and saves it away so that it doesn't have to create it again later when the code is called again. Notice that this is a waste of memory in the case where code is only called once.

Second, is the compiler required to make the caching optimization? No. The specification calls out that the compiler is only permitted to make the optimization, but not required to.

Is the compiler required to make the optimization you want? Obviously not, because it doesn't. It is permitted to, and maybe a future version of the compiler will. The compiler is open-source; if you care about this optimization, go write it and submit a pull request.

Third, is it possible to make the optimization you want? Yes. The compiler could take all pairs of lambdas that appear in the same method, compile them to the internal tree format, and do a tree comparison to see if they have the same content, and then generate the same static backing field for both.

So now we have a situation: the compiler is permitted to make a particular optimization, and it doesn't. And you've asked "why not"? That's an easy question to answer: all optimizations are not implemented until someone spends the considerable time and effort to:

  • Carefully design the optimization: under precisely what conditions is the optimization triggered and not triggered? How general should the optimization be? You've suggested that they detect similar lambda bodies but why stop there? You have two identical statements of code, so why not generate the code for those statements once instead of twice? What if you had a repeated group of statements? There is a huge amount of design work to do here.
  • In particular, an important aspect of the design is: could the user reasonably do the optimization "by hand" while still keeping the code readable. In this case, yes they could, easily. Just assign the duplicated lambda to a variable and then use the variable. An optimization which does automatically something that a user who cared could have done themselves easily is not really a very interesting or compelling optimization.
  • Your examples are trivial; real-world code is not. What does your proposed design do with identical nested lambdas? And so on.
  • Does your optimization cause the behaviour of the code in the debugger to "look weird"? You have probably noticed that when debugging code that was compiled with optimizations turned on, the debugger seems to behave weirdly; that's because there's no longer a clear mapping between the generated code and the original code. Does your optimization make that worse? Is it acceptable to users? Does the debugger need to be aware of the optimization? If so, you'll have to change the debugger. In this case, probably not, but these are questions you have to ask and answer.
  • Get the design reviewed by experts; this takes up their time, and will likely result in changes to the design
  • Make estimates of the pros and cons of the optimization -- optimizations often have hidden costs, like the memory leak I mentioned before. In particular, optimizations often preclude other optimizations which might be better.
  • Make estimates as to the total savings world-wide of this optimization. Does the optimization actually affect real-world code? Does it change the correctness of that code? Is there any production code, anywhere in the world, that would break with this optimization and cause the CTO of company X to call the CTO of Microsoft demanding a fix? If the answer is yes then maybe you might want to not do this optimization. C# is not a toy. Millions and millions of people depend on its correct operation every day.
  • What's the estimated burden of doing the optimization on compile time? Compilation doesn't have to happen between keystrokes but it does have to be pretty fast. Anything which introduces a superlinear algorithm in a common code path in the compiler is going to be unacceptable. Can you implement your optimization so that it is linear in code size? Note that the algorithm I sketched before -- compare all pairs -- is superlinear in code size. (Exercise: what's the worst case asymptotic performance of doing a tree comparison on all pairs of lambdas?)
  • Actually implement the optimization. I encourage you to do so.
  • Test the optimization; does it actually produce better code? On what metric? An optimization which causes no change to any metric is not an optimization.
  • Sign up to fix bugs in the optimization forever.

The optimization you want simply doesn't meet the bar. No one writes code like that. If they did, and they cared that it duplicated an object, they could easily fix it themselves. So the optimization optimizes code that doesn't exist, in order to get a "win" that is the construction of a single object amongst the millions and millions of objects the program will allocate. Not worth it.

But again, if you think it is, go ahead and implement it and submit a pull request. Make sure to submit the results of the investigations I noted above, because those are where the real work is. The implementation is usually the smallest part of the total effort spent on a feature; that's why C# is a successful language.