When working with Office Open XML documents, e.g., as created by Word, Excel, or PowerPoint since Office 2007 was released, you will often want to clone, or copy, an existing document and then make changes to that clone, thereby creating a new document.
Several questions have already been asked and answered (sometimes incorrectly or at least not optimally) in this context, showing that users are indeed facing issues. For example:
- Duplicating Word document using OpenXml and C#
- Word OpenXml Word Found Unreadable Content
- Open XML SDK: opening a Word template and saving to a different file-name
- docx document corrupted when copied though OpenXML C#
So, the questions are:
- What are the possible ways to correctly clone, or copy, those documents?
- Which way is the most efficient one?
The following sample class shows multiple ways to correctly copy pretty much any file and return the copy on a
MemoryStream
orFileStream
from which you can then open aWordprocessingDocument
(Word),SpreadsheetDocument
(Excel), orPresentationDocument
(PowerPoint) and make whatever changes, using the Open XML SDK and optionally the Open-XML-PowerTools.On top of the above Open XML-agnostic methods, you can also use the following approach, e.g., in case you already have opened an
OpenXmlPackage
such as aWordprocessingDocument
,SpreadsheetDocument
, orPresentationDocument
:All of the above methods correctly clone, or copy, a document. But what is the most efficient one?
Enter our benchmark, which uses the
BenchmarkDotNet
NuGet package:The above benchmark is run as follows:
And what are the results on my machine? Which method is the fastest?
It turns out that
DoWorkUsingReadAllBytesToMemoryStream()
is consistently the fastest method. However, the margin toDoWorkUsingCopyFileStreamToMemoryStream()
is easily with the margin of error. This means that you should open your Open XML documents on aMemoryStream
to do your processing whenever possible. And if you don't have to store the resulting document in your file system, this will even be much faster than unnecessarily using aFileStream
.Wherever an output
FileStream
is involved, you see a more "significant" difference (noting that a millisecond can make a difference if you process large numbers of documents). And you should note that usingFile.Copy()
is actually not such a good approach.Finally, using the
OpenXmlPackage.Clone()
method or one of its overrides turns out to be the slowest method. This is due to the fact that it involves more elaborate logic than just copying bytes. However, if all you got is a reference to anOpenXmlPackage
(or effectively one of its subclasses), theClone()
method and its overrides are your best choice.You can find the full source code in my CodeSnippets GitHub repository. Look at the CodeSnippets.Benchmark project and FileCloner class.