I have a program which basically pulls data from a database, caches it to a file and then exports that data to multiple formats (Excel, Excel 2003, CSV). I'm using the OpenXML SDK 2.0 to do the Excel work. These export processes are run in parallel (using Parallel.ForEach
), and the amount of data can be pretty large - e.g. some CSVs are 800MB. During these larger exports, I've noticed that the writing of the XML documents will hang. For instance, if I have 8 exporting in parallel, at some point they will all just "pause". They all hang around the same point:
//this.Writer is an OpenXmlWriter which was created from a WorksheetPart.
this.Writer.WriteElement(new Cell()
{
CellValue = new CellValue(value),
DataType = CellValues.String
});
When this happens, I pause the debugger (VS2013 in this case) and notice that all threads are blocking around the same portion of code - some are a bit deeper in the OpenXML SDK - but they all stem from the call to OpenXmlWriter.WriteElement
.
I dug through the source using JustDecompile but didn't find any answers. It appears that there is an intermediary stream in use which is writing to isolated storage and this is, for some reason, blocking. The underlying stream for each of these is a FileStream
.
Here is a screenshot showing all (8 in this case) parallel tasks blocked at or inside the OpenXmlWriter.WriteElement
method:
Complete Stack for one of these hung threads - with annotations.
WindowsBase.dll!MS.Internal.IO.Packaging.PackagingUtilities.CreateUserScopedIsolatedStorageFileStreamWithRandomName Normal
WindowsBase.dll!MS.Internal.IO.Packaging.PackagingUtilities.CreateUserScopedIsolatedStorageFileStreamWithRandomName(int retryCount, out string fileName)
WindowsBase.dll!MS.Internal.IO.Packaging.SparseMemoryStream.EnsureIsolatedStoreStream()
//---> Why are we writing to isolated storage at all?
WindowsBase.dll!MS.Internal.IO.Packaging.SparseMemoryStream.SwitchModeIfNecessary()
WindowsBase.dll!MS.Internal.IO.Zip.ZipIOFileItemStream.Write(byte[] buffer, int offset, int count)
System.dll!System.IO.Compression.DeflateStream.WriteDeflaterOutput(bool isAsync)
System.dll!System.IO.Compression.DeflateStream.Write(byte[] array, int offset, int count)
WindowsBase.dll!MS.Internal.IO.Packaging.CompressStream.Write(byte[] buffer, int offset, int count)
WindowsBase.dll!MS.Internal.IO.Zip.ProgressiveCrcCalculatingStream.Write(byte[] buffer, int offset, int count)
WindowsBase.dll!MS.Internal.IO.Zip.ZipIOModeEnforcingStream.Write(byte[] buffer, int offset, int count)
System.Xml.dll!System.Xml.XmlUtf8RawTextWriter.FlushBuffer()
System.Xml.dll!System.Xml.XmlUtf8RawTextWriter.WriteAttributeTextBlock(char* pSrc, char* pSrcEnd)
System.Xml.dll!System.Xml.XmlUtf8RawTextWriter.WriteString(string text)
System.Xml.dll!System.Xml.XmlWellFormedWriter.WriteString(string text)
DocumentFormat.OpenXml.dll!DocumentFormat.OpenXml.OpenXmlElement.WriteAttributesTo(System.Xml.XmlWriter xmlWriter)
DocumentFormat.OpenXml.dll!DocumentFormat.OpenXml.OpenXmlElement.WriteTo(System.Xml.XmlWriter xmlWriter)
DocumentFormat.OpenXml.dll!DocumentFormat.OpenXml.OpenXmlPartWriter.WriteElement(DocumentFormat.OpenXml.OpenXmlElement elementObject)
//---> At this point, threads seem to be blocking.
MyProject.Common.dll!MyProject.Common.Export.ExcelWriter.WriteLine(string[] values) Line 117
One more thing worth mentioning is that while there are 8 things (in this case) being exported at once, each individual exporter is writing to many files in series. For instance, a given export may have 150 underlying files it is exporting to - the input data is segmented and only a portion is written to each file. Basically, I cache the bulk data from the database, then read a line and push it (in series - one-by-one) to the streams which should include this data. The point is that if there are 8 exporters running, there could be, maybe, 1,000 files being written too but only 8 actively writing at any given time.