-->

OpenXML hanging while writing elements

2020-07-22 18:35发布

问题:

I have a program which basically pulls data from a database, caches it to a file and then exports that data to multiple formats (Excel, Excel 2003, CSV). I'm using the OpenXML SDK 2.0 to do the Excel work. These export processes are run in parallel (using Parallel.ForEach), and the amount of data can be pretty large - e.g. some CSVs are 800MB. During these larger exports, I've noticed that the writing of the XML documents will hang. For instance, if I have 8 exporting in parallel, at some point they will all just "pause". They all hang around the same point:

//this.Writer is an OpenXmlWriter which was created from a WorksheetPart.
this.Writer.WriteElement(new Cell()
{
    CellValue = new CellValue(value),
    DataType = CellValues.String
});

When this happens, I pause the debugger (VS2013 in this case) and notice that all threads are blocking around the same portion of code - some are a bit deeper in the OpenXML SDK - but they all stem from the call to OpenXmlWriter.WriteElement.

I dug through the source using JustDecompile but didn't find any answers. It appears that there is an intermediary stream in use which is writing to isolated storage and this is, for some reason, blocking. The underlying stream for each of these is a FileStream.

Here is a screenshot showing all (8 in this case) parallel tasks blocked at or inside the OpenXmlWriter.WriteElement method:

Complete Stack for one of these hung threads - with annotations.

WindowsBase.dll!MS.Internal.IO.Packaging.PackagingUtilities.CreateUserScopedIsolatedStorageFileStreamWithRandomName Normal
WindowsBase.dll!MS.Internal.IO.Packaging.PackagingUtilities.CreateUserScopedIsolatedStorageFileStreamWithRandomName(int retryCount, out string fileName)     
WindowsBase.dll!MS.Internal.IO.Packaging.SparseMemoryStream.EnsureIsolatedStoreStream()  

//---> Why are we writing to isolated storage at all?
WindowsBase.dll!MS.Internal.IO.Packaging.SparseMemoryStream.SwitchModeIfNecessary()  
WindowsBase.dll!MS.Internal.IO.Zip.ZipIOFileItemStream.Write(byte[] buffer, int offset, int count)   
System.dll!System.IO.Compression.DeflateStream.WriteDeflaterOutput(bool isAsync)     
System.dll!System.IO.Compression.DeflateStream.Write(byte[] array, int offset, int count)    
WindowsBase.dll!MS.Internal.IO.Packaging.CompressStream.Write(byte[] buffer, int offset, int count)  
WindowsBase.dll!MS.Internal.IO.Zip.ProgressiveCrcCalculatingStream.Write(byte[] buffer, int offset, int count)   
WindowsBase.dll!MS.Internal.IO.Zip.ZipIOModeEnforcingStream.Write(byte[] buffer, int offset, int count)  
System.Xml.dll!System.Xml.XmlUtf8RawTextWriter.FlushBuffer()     
System.Xml.dll!System.Xml.XmlUtf8RawTextWriter.WriteAttributeTextBlock(char* pSrc, char* pSrcEnd)    
System.Xml.dll!System.Xml.XmlUtf8RawTextWriter.WriteString(string text)  
System.Xml.dll!System.Xml.XmlWellFormedWriter.WriteString(string text)   
DocumentFormat.OpenXml.dll!DocumentFormat.OpenXml.OpenXmlElement.WriteAttributesTo(System.Xml.XmlWriter xmlWriter)   
DocumentFormat.OpenXml.dll!DocumentFormat.OpenXml.OpenXmlElement.WriteTo(System.Xml.XmlWriter xmlWriter)     
DocumentFormat.OpenXml.dll!DocumentFormat.OpenXml.OpenXmlPartWriter.WriteElement(DocumentFormat.OpenXml.OpenXmlElement elementObject)   

//---> At this point, threads seem to be blocking. 
MyProject.Common.dll!MyProject.Common.Export.ExcelWriter.WriteLine(string[] values) Line 117

One more thing worth mentioning is that while there are 8 things (in this case) being exported at once, each individual exporter is writing to many files in series. For instance, a given export may have 150 underlying files it is exporting to - the input data is segmented and only a portion is written to each file. Basically, I cache the bulk data from the database, then read a line and push it (in series - one-by-one) to the streams which should include this data. The point is that if there are 8 exporters running, there could be, maybe, 1,000 files being written too but only 8 actively writing at any given time.

回答1:

I know that question is pretty old, but this is known Microsoft issue with OpenXml-IsolatedFileStorage. You can read about workaround here http://support.microsoft.com/kb/951731:

The IsolatedStorageFile class is not thread safe, IsolatedStorageFile is static and shared between all PackagePart objects. So when multiple PackagePart streams using IsolatedStorageFile objects to buffer data are accessed for writing (includes flushing as well), a thread safety problem in the IsolatedStorageFile class is exposed, causing a deadlock.

The basic idea is to wrap a stream of PackagePart and lock writing to it. They pointed an example with a wrapped stream. Here is implementation:

public class PackagePartStream : Stream
{
    private readonly Stream _stream;

    private static readonly Mutex Mutex = new Mutex(false);

    public PackagePartStream(Stream stream)
    {
        _stream = stream;
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        return _stream.Seek(offset, origin);
    }

    public override void SetLength(long value)
    {
        _stream.SetLength(value);
    }

    public override int Read(byte[] buffer, int offset, int count)
    {
        return _stream.Read(buffer, offset, count);
    }

    public override void Write(byte[] buffer, int offset, int count)
    {
        Mutex.WaitOne(Timeout.Infinite, false);
        _stream.Write(buffer, offset, count);
        Mutex.ReleaseMutex();
    }

    public override bool CanRead
    {
        get { return _stream.CanRead; }
    }

    public override bool CanSeek
    {
        get { return _stream.CanSeek; }
    }

    public override bool CanWrite
    {
        get { return _stream.CanWrite; }
    }

    public override long Length
    {
        get { return _stream.Length; }
    }

    public override long Position
    {
        get { return _stream.Position; }
        set { _stream.Position = value; }
    }

    public override void Flush()
    {
        Mutex.WaitOne(Timeout.Infinite, false);
        _stream.Flush();
        Mutex.ReleaseMutex();
    }

    public override void Close()
    {
        _stream.Close();
    }

    protected override void Dispose(bool disposing)
    {
        _stream.Dispose();
    }
}

And example of usage:

var worksheetPart = document.WorkbookPart.AddNewPart<WorksheetPart>();
var workSheetWriter = OpenXmlWriter.Create(new PackagePartStream(worksheetPart.GetStream()));
workSheetWriter.WriteStartElement(new Worksheet());
//rest of your code goes here ...