PDFBox embedded TTF fonts not working

2019-06-21 09:27发布

问题:

I am using PDFBox to build a document from an existing PDF template, so it opens the file, adds text to it, and saves it. It works well, except when trying to use external TTF fonts. I have tried different things and searched for 2 days for solutions, but there's not much out there on PDFBox.

Here's some code, using the font "Tardy Kid" because it can't be mistaken for anything else, and is not likely to be part of any standard library .

The code executes fine, displays "TardyKid" from the println (showing that the font is loaded and the name is gettable), and displays the text -- but it's in Helvetica. More sophisticated parts of the code that use getStringWidth() to calculate width seem to indicate successful loading of the width tables too. It just doesn't display correctly.

The code runs in the context of a larger program that opens an existing PDF document (a template) and adds text to it. It all seems to work fine except for

 public void setText ( PDDocument document, String text ) throws IOException {
     int lastPage = document.getNumberOfPages() - 1;
     PDPage page = (PDPage) document.getDocumentCatalog().getAllPages().get(lastPage);
     PDPageContentStream contentStream = null;
     try {
         contentStream = new PDPageContentStream(document,page,true,true,false);
         File fontFile = new File(m_fontDir, "Tardy_Kid.ttf");
         PDFont font = PDTrueTypeFont.loadTTF(document, fontFile);
         Color color =  new Color(196, 18, 47);
         float x = 100f, y = 700f;
         System.out.println(font.getBaseFont());
         contentStream.setFont(font, 32);
         contentStream.setNonStrokingColor(color);
         contentStream.beginText();
         contentStream.moveTextPositionByAmount(x,y);
         contentStream.drawString(text);
         contentStream.endText();
     } finally {
         if (contentStream != null) {
             contentStream.close();
         }
     }
 }

回答1:

I have found the answer. I am not sure if this is a bug in PDFBox or not, but if you open/close a content stream (returned by PDPageContentStream) more than once on the same page, it doesn't work correctly. So having the content stream open/close inside the setText routine did not work when the routine was called more than once on a page. Moving the stream outside the routine and opening/closing it once for the whole page seemed to clear up this problem (and a couple of others).

This is not mentioned anywhere in the documentation or example code, and is very subtle at best. I would call it a bug, especially since it "works" (does not throw any exceptions) but creates indeterminate and/or wrong results on the page.



回答2:

I had a similar issue, which came from a pom update that corrupted the pdf template file when building our war file.

The stack trace stated "Could not read embedded TTF for font TimesNewRoman,Bold" this of course after seeing an error related to "pushback size", which we set a new property value for to get past (The exception I was seeing for reference: org.apache.pdfbox.exceptions.WrappedIOException: Could not push back 480478 bytes in order to reparse stream. Try increasing push back buffer using system property org.apache.pdfbox.baseParser.pushBackSize).

It took us a while, however after exploding the war and trying to open the pdf file in the war, we noticed that it was corrupt, but the pdf file that was in source was not corrupt.

The root cause of our issue was that we added "filtering" in our pom for our resource folder. We did this so that we could use some reflection to get some values in our health check page, but that corrupted the pdf file, which we figured out from the following reference: https://bitbucket.org/petermr/xhtml2stm/issues/12/pdf-files-are-being-corrupted-at-some

Below is an example of the filtering we setup that bit us:

<resources>
    <resource>
        <directory>src/main/resources</directory>
        <filtering>true</filtering>
    </resource>
</resources>

Our solution was to remove this from our pom and rework how we got the information for our health page.