Why is lambda faster than IL injected dynamic meth

2019-03-15 07:23发布

问题:

I just built dynamic method - see below (thanks to the fellow SO users). It appears that the Func created as a dynamic method with IL injection 2x slower than the lambda.

Anyone knows why exactly?

(EDIT : this was built as Release x64 in VS2010. Please run it from console not from inside Visual Studio F5.)

class Program
{
    static void Main(string[] args)
    {
        var mul1 = IL_EmbedConst(5);
        var res = mul1(4);

        Console.WriteLine(res);

        var mul2 = EmbedConstFunc(5);
        res = mul2(4);

        Console.WriteLine(res);

        double d, acc = 0;

        Stopwatch sw = new Stopwatch();

        for (int k = 0; k < 10; k++)
        {
            long time1;

            sw.Restart();

            for (int i = 0; i < 10000000; i++)
            {
                d = mul2(i);
                acc += d;
            }

            sw.Stop();

            time1 = sw.ElapsedMilliseconds;

            sw.Restart();

            for (int i = 0; i < 10000000; i++)
            {
                d = mul1(i);
                acc += d;
            }

            sw.Stop();

            Console.WriteLine("{0,6} {1,6}", time1, sw.ElapsedMilliseconds);
        }

        Console.WriteLine("\n{0}...\n", acc);
        Console.ReadLine();
    }

    static Func<int, int> IL_EmbedConst(int b)
    {
        var method = new DynamicMethod("EmbedConst", typeof(int), new[] { typeof(int) } );

        var il = method.GetILGenerator();

        il.Emit(OpCodes.Ldarg_0);
        il.Emit(OpCodes.Ldc_I4, b);
        il.Emit(OpCodes.Mul);
        il.Emit(OpCodes.Ret);

        return (Func<int, int>)method.CreateDelegate(typeof(Func<int, int>));
    }

    static Func<int, int> EmbedConstFunc(int b)
    {
        return a => a * b;
    }
}

Here is the output (for i7 920)

20
20

25     51
25     51
24     51
24     51
24     51
25     51
25     51
25     51
24     51
24     51

4.9999995E+15...

============================================================================

EDIT EDIT EDIT EDIT

Here is the proof of that dhtorpe was right - more complex lambda will lose its advantage. Code to prove it (this demonstrate that Lambda has exactly the same performance with IL injection):

class Program
{
    static void Main(string[] args)
    {
        var mul1 = IL_EmbedConst(5);
        double res = mul1(4,6);

        Console.WriteLine(res);

        var mul2 = EmbedConstFunc(5);
        res = mul2(4,6);

        Console.WriteLine(res);

        double d, acc = 0;

        Stopwatch sw = new Stopwatch();

        for (int k = 0; k < 10; k++)
        {
            long time1;

            sw.Restart();

            for (int i = 0; i < 10000000; i++)
            {
                d = mul2(i, i+1);
                acc += d;
            }

            sw.Stop();

            time1 = sw.ElapsedMilliseconds;

            sw.Restart();

            for (int i = 0; i < 10000000; i++)
            {
                d = mul1(i, i + 1);
                acc += d;
            }

            sw.Stop();

            Console.WriteLine("{0,6} {1,6}", time1, sw.ElapsedMilliseconds);
        }

        Console.WriteLine("\n{0}...\n", acc);
        Console.ReadLine();
    }

    static Func<int, int, double> IL_EmbedConst(int b)
    {
        var method = new DynamicMethod("EmbedConstIL", typeof(double), new[] { typeof(int), typeof(int) });

        var log = typeof(Math).GetMethod("Log", new Type[] { typeof(double) });

        var il = method.GetILGenerator();

        il.Emit(OpCodes.Ldarg_0);
        il.Emit(OpCodes.Ldc_I4, b);
        il.Emit(OpCodes.Mul);
        il.Emit(OpCodes.Conv_R8);

        il.Emit(OpCodes.Ldarg_1);
        il.Emit(OpCodes.Ldc_I4, b);
        il.Emit(OpCodes.Mul);
        il.Emit(OpCodes.Conv_R8);

        il.Emit(OpCodes.Call, log);

        il.Emit(OpCodes.Sub);

        il.Emit(OpCodes.Ret);

        return (Func<int, int, double>)method.CreateDelegate(typeof(Func<int, int, double>));
    }

    static Func<int, int, double> EmbedConstFunc(int b)
    {
        return (a, z) => a * b - Math.Log(z * b);
    }
} 

回答1:

Given that the performance difference exists only when running in release mode without a debugger attached, the only explanation I can think of is that the JIT compiler is able to make native code optimizations for the lambda expression that it is not able to perform for the emitted IL dynamic function.

Compiling for release mode (optimizations on) and running without the debugger attached, the lambda is consistently 2x faster than the generated IL dynamic method.

Running the same release-mode optimized build with a debugger attached to the process drops the lambda performance to comparable or worse than the generated IL dynamic method.

The only difference between these two runs is in the behavior of the JIT. When a process is being debugged, the JIT compiler suppresses a number of native code gen optimizations to preserve native instruction to IL instruction to source code line number mappings and other correlations that would be trashed by aggressive native instruction optimizations.

A compiler can only apply special case optimizations when the input expression graph (in this case, IL code) matches certain very specific patterns and conditions. The JIT compiler clearly has special knowledge of the lambda expression IL code pattern and is emitting different code for lambdas than for "normal" IL code.

It is quite possible that your IL instructions do not exactly match the pattern that causes the JIT compiler to optimize the lambda expression. For example, your IL instructions encode the B value as an inline constant, whereas the analogous lambda expression loads a field from an internal captured variable object instance. Even if your generated IL were to mimic the captured field pattern of the C# compiler generated lambda expression IL, it still might not be "close enough" to receive the same JIT treatment as the lambda expression.

As mentioned in the comments, this may well be due to inlining of the lambda to eliminate the call/return overhead. If this is the case, I would expect to see this difference in performance disappear in more complex lambda expressions, since inlining is usually reserved for only the simplest of expressions.



回答2:

The constant 5 was the cause. Why on earth could that be? Reason: When the JIT knows the constant is 5 it does not emit an imul instruction but a lea [rax, rax * 4]. This is a well-known assembly-level optimization. But for some reason, this code executed slower. The optimization was a pessimization.

And the C# compiler emitting a closure prevented the JIT from optimizing the code in that particular way.

Proof: Change the constant to 56878567 and the performance changes. When inspecting the JITed code you can see that an imul is used now.

I managed to catch this by hardcoding the constant 5 into the lambda like this:

    static Func<int, int> EmbedConstFunc2(int b)
    {
        return a => a * 5;
    }

This allowed me to inspect the JITed x86.

Sidenote: The .NET JIT does not inline delegate calls in any way. Just mentioning this because it was falsely speculated this was the case in the comments.

Sidenode 2: In order to receive the full JIT optimization level you need to compile in Release mode and start without debugger attached. The debugger prevents optimizations from being performed, even in Release mode.

Sidenote 3: Although EmbedConstFunc contains a closure and normally would be slower than the dynamically generated method the effect of this "lea"-optimization does more damage and eventually is slower.



回答3:

lambda is not faster than DynamicMethod. It is based on. However, static method is faster than instance method but delegate create for static method is slower than delegate create for instance method. Lambda expression build a static method but use it like instance method by adding as first paameter a "Closure". Delegate to static method "pop" stack to get rid of non needed "this" instance before "mov" to real "IL body". in case of delegate for instance method "IL body" is directly hit. This is why a delegate to an hypotetic static method build by lambda expression is a faster (maybe a side effect of delegate pattern code sharing beetween instance/static method)

The performance issue can be avoid by adding an unused first argument (Closure type for example) to DynamicMethod and call CreateDelegate with explicit target instance (null can be used).

var myDelegate = DynamicMethod.CreateDelegate(MyDelegateType, null) as MyDelegateType;

http://msdn.microsoft.com/fr-fr/library/z43fsh67(v=vs.110).aspx

Tony THONG



标签: c# .net-4.0