XmlSerializer startup HUGE performance loss on 64b

2019-01-11 03:56发布

问题:

I am experiencing a really HUGE performance loss while calling a simple XmlSerializer.Deserizlize() on a class with lots of fields.

NOTE: I'm writing the code without Visual Studio, at home, so it may have some errors.

My serializable class is flat and has hundreds of fields:

[Serializable]
class Foo
{
    public Foo() { }

    [XmlElement(ElementName = "Field1")]
    public string Field1;

    // [...] 500 Fields defined in the same way

    [XmlElement(ElementName = "Field500")]
    public string Field500;
}

My application deserializes an input string (even small):

 StringReader sr = new StringReader(@"<Foo><Field1>foo</Field1></Foo>");
 XmlSerializer serializer = new XmlSerializer(typeof(Foo));
 object o = serializer.Deserialize(sr);

Running the application in 32bit systems (or with 32bit forced with corflags.exe), the code takes about ONE SECOND the first time (temp serialization class generation, and all...), then it's close to 0.

Running the application in 64bit systems, the code takes ONE MINUTE the first time, then it's close to 0.

What could possibly hang the system for such a long time, during the first execution of an XmlSerializer, for a big class, in a 64bit system?

Right now I'm not sure if I have to blame temp class generation/remove, xml name table initialization, CAS, Windows Search, AntiVirus, or Santa Claus...

SPOILERS

Here are my tests, don't read this if you don't want to be sidetracked by my (possible) analysys mistakes.

  • Running the code the from Visual Studio debugger makes the code run FAST even in 64 bit systems
  • Adding the (totally undocumented) system.diagnostic switch "XmlSerialization.Compile", which prevents the system from removing the serialization temp classes, makes the code run FAST even in 64 bit systems
  • Taking the temp FooXmlSerializer class created by the runtime, including the .cs in my project, and using it instead of the XmlSerializer, makes the code run FAST even in 64 bit systems
  • Creating the same FooXmlSerializer class with sgen.exe, including the .cs in my project, and using it instead of the XmlSerializer, makes the code run FAST even in 64 bit systems
  • Creating the same FooXmlSerializer class with sgen.exe, referencing the Foo.XmlSerializers.dll assembly in my project, and using it instead of the XmlSerializer, makes the code run SLOW even in 64 bit systems (this bugs me a lot)
  • The performance loss only happens if the input to deserialize actually contains a field of the big class (this also bug me a lot)

To further explain the last point, if I have a class:

[Serializable]
class Bar
{
    public Bar() { }

    [XmlElement(ElementName = "Foo")]
    public Foo Foo; // my class with 500 fields
}

The deserialize is slow only when passing a Foo child. Even if I already performed a deserialization:

 StringReader sr = new StringReader(@"<Bar></Bar>");
 XmlSerializer serializer = new XmlSerializer(typeof(Bar));
 object o = serializer.Deserialize(sr); // FAST

 StringReader sr = new StringReader(@"<Bar><Foo><Field1>foo</Field1></Foo></Bar>");
 XmlSerializer serializer = new XmlSerializer(typeof(Bar));
 object o = serializer.Deserialize(sr); // SLOW

EDIT I forgot to say that I analyzed the execution with Process Monitor, and I don't see any task taking a long time from my app or from csc.exe, or anything Framework-related. The system just does other stuff (or I am missing something), like antivirus, explorer.exe, Windows Search indexing (already tried to turn them off)

回答1:

I don't know if this is related at all, but I had an issue with XSLT and found those rather interesting comments by Microsoft about the 64-Bit JITter:

The root of the problem is related to two things: First, the x64 JIT compiler has a few algorithms that are quadratically scaling. One of them is the debug info generator, unfortunately. So for very large methods, it really gets out of control.

[...]

some algorithms in the 64 bit JIT that have polynomial scaling. We're actually working on porting the 32 bit JIT compiler to x64, but that won't see the light of day until the next side-by-side release of the runtime (as in "2.0 & 4.0 run side-by-side, but 3.0/3.5/3.5SP1 were 'in-place' releases). I've switched this over to a 'suggestion' so I can keep it attached to the JIT-throughput work item to make sure this is fixed when the newly ported JIT is ready to ship.

Again, this is about a completely different issue, but it appears to me that the 64-Bit JITter comments are universal.



回答2:

UPDATE:

I was able to reproduce this, investigation shows that most time was spent in JIT-compiler:

JittingStarted: "Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderFoo", "Read2_Foo", "instance class SerializersTester.Foo"

You can easily proof that without any profiler tool.

  • Generate *.XmlSerializers.dll via sgen for x86 and x64 targets
  • Generate native images via ngen.

You can notice x64 generation will be much more slower in compare with x86 assembly

The exact reason hides in x64 JIT-internals (BTW it completely different from x86) and unfortunately I don't have enough spare time to find it.

To avoid such performance loss you can generate serializer's assembly via sgen, reference it and compile to native image via ngen during application setup on end user PC.



回答3:

To clarify the "XmlSerialization.compile" this is what it's happening:

If we run the code without a .config file on 64 bit it's slow.

If we add the following section to the .config file for the application

<configuration>
   <system.diagnostics>
     <switches>
        <add name="XmlSerialization.Compilation" value="4"/>
     </switches>
   </system.diagnostics>
</configuration>

The result is the following:

  • .cs file, DLL and PDB file for the serializer are left in the temp folder
  • serializer start quicky, it's still slower than on 32 bit but definitively acceptable (1-2 seconds instead of 60)

Maybe creating the DLL in debug mode (because there are PDB files available) change the behavior of the JIT compiler making it fast again...



回答4:

Microsoft has known about this since the release of the 64 bit .NET:

http://connect.microsoft.com/VisualStudio/feedback/details/508748/memory-consumption-alot-higher-on-x64-for-xslcompiledtransform-transform-then-on-x86

From MSFT: "the x64 JIT compiler has a few algorithms that are quadratically scaling. ... it's been something that we've seen a number of times since the 64 bit framework first released in 2005." and

"This issue is a) known, and b) not really trivial to address. It's a design issue with the 64 bit JIT. We're in the early stages of replacing our 64-bit JIT implementation, so it will eventually get address, but not in the CLR 4.0 timeframe, unfortunately."