According to the PDF 1.7 specification, Sec 3.4 (http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf, page 90):
The preceding sections describe the syntax of individual objects. This
section describes how objects are organized in a PDF file for efficient
random access and incremental update. A canonical PDF file initially
consists of four elements (see Figure 3.2):
A one-line header identifying the version of the PDF specification to which
the file conforms
A body containing the objects that make up the document contained in the
file
A cross-reference table containing information about the indirect
objects in the file
A trailer giving the location of the cross-reference table and of certain
special objects within the body of the file
Basically, the structure has the header, followed by the body content, then the cross reference table, and finally the trailer which gives the location of the xref table. The key part here is that the trailer
and xref
tables are at the end of the file, and the xref
table contains the pertinent metadata of the body content (mainly the 10-digit byte offset).
Given that the xref table itself is located at the very end of a PDF file:
- How is it that my browser (Google Chrome) was able to partially display the PDF file (the first hundred pages or so) before the entire file was finished downloading?
See screenshot of my partially downloaded PDF file:
The type of PDF files the OP describes is also known as "web optimized" (marketing term) or "linearized" (technical term in PDF parlance).
It has to be noted that it only works if two extra conditions (on top of the linearization feature of the files) are met:
- The PDF viewer needs to be able to handle these types of PDF and take advantage of the linearization feature.
- The (remote) host serving the linearized PDFs needs to support "byte streaming".
If byte-streaming is not supported by the server or if the PDF file is not linearized, the entire file still needs to be downloaded completely before it the viewer can display any page.
The description about the PDF file structure quoted by the OP does not apply to linearized PDF files. These are organized in a slightly different way:
- There apply special rules for ordering of PDF objects ("standard" PDFs can have objects in any arbitrary order).
- The PDF document needs to contain some additional structures called "hint tables" which guarantee efficient navigation within it (even if it is not yet completely downloaded).
Regarding the additional structures, a linearized PDF contains its objects in two groups:
In the first group is the document catalogue, all document-level objects, and all objects belonging to the first-to-be-displayed page (not necessarily "page 0"!). The objects shall be numbered sequentially.
The second group holds all the other objects.
These groups shall be indexed by two xref
table sections.
- The first group's
xref
section appears immediately after the first indirect object, very close to the beginning of the file.
- The second group's
xref
section is positioned at the end of the file (just as in standard, non-linearized PDFs).
The first object immediately after the %PDF-1.x
header line shall contain a dictionary key indicating the /Linearized
property of the file.
This overall structure allows a conforming reader to learn the complete list of object addresses very quickly, without needing to download the complete file from beginning to end:
The viewer can display the first page(s) very fast, before the complete file is downloaded.
The user can click on a thumbnail page preview (or a link in the ToC of the file) in order to jump to, say, page 445, immediately after the first page(s) have been displayed, and the viewer can then request all the objects required for page 445 by asking the remote server via byte range requests to deliver these "out of order" so the viewer can display this page faster. (While the user reads pages out of order, the downloading of the complete document will still go on in the background...)
The technical details of PDF "linearization" can be found in the 'normative' Appendix F of Adobe's original PDF 1.7 Specification (ca. 11 MByte -- which in itself is an example of such a linearized PDF file!)