Get Word ML from clipboard

2019-04-11 22:25发布

问题:

I am intercepting the paste event for a richtextbox, in order to process the contents before pasting. If it contains tables or images etc. I need to do some custom stuff. If the copied selection is from Word 2010 and consists of mixed content (eg. text and table/image) Word places the content on the clipboard in a number of formats. These includes HTML and RTF, but I would rather work with WordML. I've used ClipSpy to check what formats and data is actually put on the clipboard and the "Embed source" format seems to be the format containing WordML. I would think this could be opened as a Package:

var stream = Clipboard.GetData("Embed Source") as MemoryStream;
var package = Package.Open(stream);

It throws an EndOfStreamException and I'm thinking it migth be wrapped in something else. I can write the stream to disk and open it using 7-zip and see that the contents are as expected. So basically two questions: Is "Embed source" the right DataObject to get the WordML? If it is, how do I deserialize it?

回答1:

After saving the stream to disk and doing a binary comparison to a proper docx, I figured out that it was in fact wrapped in a Compound Document File: http://www.openoffice.org/sc/compdocfileformat.pdf. I googled the first few bytes

D0 CF 11 E0 A1 B1 1A E1

which is the identifier of the CDF format.

The package can be extracted from the Compound file using OpenMCDF.