Understanding the various options for runtime code

2020-06-25 07:03发布

问题:

I'm working on an application where I'd like to dynamically generate code for a numerical calculation (for performance). Doing this calculation as a data driven operation is too slow. To describe my requirements, consider this class:

class Simulation
{
    Dictionary<string, double> nodes;

    double t, dt;

    private void ProcessOneSample()
    {
        t += dt;
        // Expensive operation that computes the state of nodes at the current t.
    }

    public void Process(int N, IDictionary<string, double[]> Input, IDictionary<string, double[]> Output)
    {
        for (int i = 0; i < N; ++i)
        {
            foreach (KeyValuePair<string, double[]> j in Input)
                nodes[j.Key] = j.Value[i];
            ProcessOneSample();
            foreach (KeyValuePair<string, double[]> j in Output)
                j.Value[i] = nodes[j.Key];
        }    
    }
}

What I want to do is JIT compile a function that implements the outer loop in Process. The code that defines this function will be generated by the data that is currently used to implement ProcessOneSample. To clarify what I'm expecting, I'd expect all of the dictionary lookups to be performed once in the compilation process (i.e. the JIT compile would bind directly to the relevant object in the dictionary), so that when the compiled code is actually executed, it is as if all of the lookups had been hardcoded.

What I'm trying to figure out is what the best tools are to tackle this problem. The reason I'm asking this question is because there are so many options:

  • Use Roslyn. Current stumbling block is how to bind expressions in the syntax to member variables from the host program (i.e. the values in the 'state' dictionary). Is this possible?
  • Use LINQ Expressions (Expression.Compile).
  • Use CodeDom. Just recently became aware of this in my google searching, and what prompted this question. I'm not too stoked on stumbling my way through a third compilation framework in .Net.
  • My original plan before I knew any of these tools existed was to call native code (x86) that I JIT compiled myself. I have some experience with this, but there are a lot of unknowns here that I have not solved yet. This is also my backup option if the performance of the above options is not sufficient. I'd prefer one of the above 3 solutions because I am sure they will be much, much simpler, assuming I can get one of them to work!

Does anyone have any experience with something like this that they would be able to share?

回答1:

I'm not sure I understand your example, nor that code generation is hte best way to improve it's performance.

But if you want to understand the code generation options, first consider your requirements. Performance is what you want, but there is the performance of the code generation, and the performance of the generated code. These are definitelly not the same thing. Then there is the writability and readability of your code. Different options have very different scores on this one.

Your first option is Reflection.Emit, especially DynamicMethod. Reflection.Emit is a pretty low level API, and is pretty efficient (i.e. the code generation has good performance). Furthermore, because you have complete control of the code being generated, you have the potential to generate the most efficient code (or to generate very bad code, obviously). Also, you are not restricted to what a language such as C# allows you to do, the full power of the CLR is at your fingertips. The biggest problem with Reflection.Emit is the large volume of code you need to write, and the deep knowledge of IL required to do so. Writing that code is not easy, nor is afterwards reading or maintaining it.

Linq.Expressions, more specifically the Compile method provide a nice alternative. You can think of this as being essentially a type-safe wrapper around DynamicMethod generation with Reflection.Emit. There is some overhead in generating the code, which would probably not be a big problem. As for freedom of expression, you can do pretty much everything you can do in a normal C# method. You do not have complete control over the generated code, but the quality is generally very good. The biggest advantage of this approach is that it is much easier to write and read a program using this technique.

As for Roslyn, you have the option of generating a syntax tree, or generating C# (or VB) and have it parsed into a syntax tree to be compiled. It is way to early to guess what the performance might be, as we do not have production code available (at the time of writing). Obviously, parsing a syntax tree will have some overhead, and if you are generating a single method Roslyn's ability to generate multiple methods in paralle won't help a lot. Using Roslyn has the potential that it allows for very readable programs though.

As for CodeDom, I would recommend against it. This is a very old API, that (in the current implementation) launches a CSC.exe process to compile your code. I also believe that it does not support the complete C# language.