Prior versions of Apple's iWork suite used a very simple document format:
- documents were Bundles of resources (folders, zipped or not)
- the bundle contained an
index.apxl[z]
file describing the document structure in a proprietary but fairly easy to understand schema
iWork '13 has completely redone the format. Documents are still bundles, but what was in the index XML file is now encoded in a set of binary files with type suffix .iwa
packed into Index.zip
.
In Keynote, for example, there are the following iwa
files:
AnnotationAuthorStorage.iwa
CalculationEngine.iwa
Document.iwa
DocumentStylesheet.iwa
MasterSlide-{n}.iwa
Metadata.iwa
Slide{m}.iwa
ThemeStylesheet.iwa
ViewState.iwa
Tables/DataList.iwa
for MasterSlide
s 1…n and Slide
s 1…m
The purpose of each of these is quite clear from their naming. The files even appear uncompressed, with essentially all content text directly visible as strings among the binary blobs (albeit with some like RTF/NSAttributedString/similar-related garbage in the midst of the readable ASCII characters).
I have posted the unpacked Index
of a simple example Keynote document here: https://github.com/jrk/iwork-13-format.
However, the overall file format is non-obvious to me. Apple has a long history of using simple, platform-standard formats like plists for encoding most of their documents, but there is no clear type tag at the start of the files, and it is not obvious to me what these iwa
files are.
Do these files ring any bells? Is there evidence they are in some reasonably comprehensible serialization format?
Rummaging through the Keynote app runtime and class dumps with F-Script, the only evidence I've found is for some use of Protocol Buffers in the serialization classes which seem to be used for iWork, e.g.: https://github.com/nst/iOS-Runtime-Headers/blob/master/PrivateFrameworks/iWorkImport.framework/TSPArchiverBase.h.
Quickly piping a few of the files through protoc --decode_raw
with the first 0…16 bytes lopped off produced nothing obviously usable.