I create PDFs and concatenate them into a single PDF.
My resulting PDF is a lot bigger than I had expected in file size.
I realised that my output PDF has a ton of duplicate font, and it is the reason of unexpectedly big file size.
Here, my question is:
I would like to create PDFs which only embed font information, so let they use Windows System Font.
When I merge them into a single PDF, I insert actual font which PDF needs.
If possible, please let me know how to do it.
I've created the MergeAndAddFont example to explain the different options.
We'll create PDFs using this code snippet:
public void createPdf(String filename, String text, boolean embedded, boolean subset) throws DocumentException, IOException {
// step 1
Document document = new Document();
// step 2
PdfWriter.getInstance(document, new FileOutputStream(filename));
// step 3
document.open();
// step 4
BaseFont bf = BaseFont.createFont(FONT, BaseFont.WINANSI, embedded);
bf.setSubset(subset);
Font font = new Font(bf, 12);
document.add(new Paragraph(text, font));
// step 5
document.close();
}
We use this code to create 3 test files, 1, 2, 3 and we'll do this 3 times: A, B, C.
The first time, we use the parameters embedded = true
and subset = true
, resulting in the files testA1.pdf with text "abcdefgh"
(3.71 KB), testA2.pdf with text "ijklmnopq"
(3.49 KB) and testA3.pdf with text "rstuvwxyz"
(3.55 KB). The font is embedded and the file size is relatively low because we only embed a subset of the font.
Now we merge these files using the following code, using the smart
parameter to indicate whether we want to use PdfCopy
or PdfSmartCopy
:
public void mergeFiles(String[] files, String result, boolean smart) throws IOException, DocumentException {
Document document = new Document();
PdfCopy copy;
if (smart)
copy = new PdfSmartCopy(document, new FileOutputStream(result));
else
copy = new PdfCopy(document, new FileOutputStream(result));
document.open();
PdfReader[] reader = new PdfReader[3];
for (int i = 0; i < files.length; i++) {
reader[i] = new PdfReader(files[i]);
copy.addDocument(reader[i]);
}
document.close();
for (int i = 0; i < reader.length; i++) {
reader[i].close();
}
}
When we merge the document, be it with PdfCopy
or PdfSmartCopy
, the different subsets of the same font will be copied as separate objects in the resulting PDF testA_merged1.pdf / testA_merged2.pdf (both 9.75 KB).
This is the problem you are experiencing: PdfSmartCopy
can detect and reuse identical objects, but the different subsets of the same font aren't identical and iText can't merge different subsets of the same font into one font.
The second time, we use the parameters embedded = true
and subset = false
, resulting in the files testB1.pdf (21.38 KB), testB2.pdf (21.38 KB) and testA3.pdf (21.38 KB). The font is fully embedded and the file size of a single file is a lot bigger than before because the full font is embedded.
If we merge the files using PdfCopy
, the font will be present in the merged document redundantly, resulting in the bloated file testB_merged1.pdf (63.16 KB). This is definitely not what you want!
However, if we use PdfSmartCopy
, iText detects an identical font stream and reuses it, resulting in testB_merged2.pdf (21.95 KB) which is much smaller than we had with PdfCopy
. It's still bigger than the document with the subsetted fonts, but if you're concatenating a huge amount of files, the result will be better if you embed the complete font.
The third time, we use the parameters embedded = false
and subset = false
, resulting in the files testC1.pdf (2.04 KB), testC2.pdf (2.04 KB) and testC3.pdf (2.04 KB). The font isn't embedded, resulting in an excellent file size, but if you compare with one of the previous results, you'll see that the font looks completely different.
We merge the files using PdfSmartCopy
, resulting in testC_merged1.pdf (2.6 KB). Again, we have an excellent file size, but again we have the problem that the font isn't visualized correctly.
To fix this, we need to embed the font:
private void embedFont(String merged, String fontfile, String result) throws IOException, DocumentException {
// the font file
RandomAccessFile raf = new RandomAccessFile(fontfile, "r");
byte fontbytes[] = new byte[(int)raf.length()];
raf.readFully(fontbytes);
raf.close();
// create a new stream for the font file
PdfStream stream = new PdfStream(fontbytes);
stream.flateCompress();
stream.put(PdfName.LENGTH1, new PdfNumber(fontbytes.length));
// create a reader object
PdfReader reader = new PdfReader(merged);
int n = reader.getXrefSize();
PdfObject object;
PdfDictionary font;
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(result));
PdfName fontname = new PdfName(BaseFont.createFont(fontfile, BaseFont.WINANSI, BaseFont.NOT_EMBEDDED).getPostscriptFontName());
for (int i = 0; i < n; i++) {
object = reader.getPdfObject(i);
if (object == null || !object.isDictionary())
continue;
font = (PdfDictionary)object;
if (PdfName.FONTDESCRIPTOR.equals(font.get(PdfName.TYPE))
&& fontname.equals(font.get(PdfName.FONTNAME))) {
PdfIndirectObject objref = stamper.getWriter().addToBody(stream);
font.put(PdfName.FONTFILE2, objref.getIndirectReference());
}
}
stamper.close();
reader.close();
}
Now, we have the file testC_merged2.pdf (22.03 KB) and that's actually the answer to your question. As you can see, the second option is better than this third option.
Caveats: This example uses the Gravitas One font as a simple font. As soon as you use the font as a composite font (you tell iText to use it as a composite font by choosing the encoding IDENTITY-H
or IDENTITY-V
), you can no longer choose whether or not to embed the font, whether or not to subset the font. As defined in ISO-32000-1, iText will always embed composite fonts and will always subset them.
This means that you can't use the above solutions when you need special fonts (Chinese, Japanese, Korean). In that case, you shouldn't embed the fonts, but use so-called CJK fonts. They CJK fonts will use font packs that can be downloaded by Adobe Reader.