How to zip a WordprocessingML folder into readable

2019-01-21 09:30发布

问题:

I have been trying to write a simple Markdown -> docx parser/writer, but am completely stuck with the last part, which should be the easiest: i.e. compressing the folder into a .docx that Word, or any other .docx reader, will recognize.

My parser-writer is irrelevant really: I have this problem if I simply unzip any old Word-produced *.docx and then try to recompress it with the usual compression utilities, giving it the file-ending docx. Is there some mysterious header I should be adding, or do I need a special OPC compression utility, or what?

I don't so much want a tool that will do this, as to figure out what is supposed to be there. It seems to be independent of the WordprocessingML specification.

Needless to say I don't know anything about compression. Everything I can find via Google has to do with fancy utilities you can use in business, but I'm making a little executable that would be GPLd or something, and should work on anything.

回答1:

The most common problem around manually zipping together Open XML documents is that it will not work if you zip the directory instead of the contents. In other words, the[content_types].xml file, and the word, docProps, and _rels directories need to reside at the root level of the zip file.



回答2:

Here are steps to unzip my.docx and re-zip:

% mkdir unzipped
% cd unzipped/
% unzip ../my.docx    
% zip -r ../rezipped.docx *
% open ../rezipped.docx 


回答3:

Further to what Mica said, the contents of the ZIP file are organised according to the Open Packaging Convention; cf. Microsoft's Essentials of the Open Packaging Convention.

You can use the .NET System.IO.Packaging to make and manipulate .docx files; this class is implemented in the Mono project.



回答4:

The compression algorithm used is "Zip" (Base 64) compression.

7zip seems to offer this, though i have no tested it.