I'm working on my own pdf generating lib in Java and I'm having some troubles with font/text rendering. The text displayed (font, word spacing, character spacing,...) in Java differs from the one displayed in PDF.
In my below example, I'm using the font "Time New Roman" which is one of the PDF base font (so I have not to compute and output all the font metrics into the pdf).
So concretely in my generated PDF, I have this:
BT
/F5 16 Tf
849 921 Td
(Normal Return Distribution) Tj
ET
And the font F5 is defined by the object 29 0 R, which is (only basefont, so no text metrics is specified):
29 0 obj <</Type /Font /Subtype /Type1 /BaseFont /Times-Roman>>
endobj
In Java, I'm using :
g2d.setFont(new Font("TimesRoman", Font.PLAIN, 16));
g2d.drawString("Normal Return Distribution", 849, 921);
I've drawn the text into a rectangle which match the text boundaries, and in Java all it's ok (I've compute the string bounds in java), but in adobe acrobat reader, the text is bigger than the rectangle.
Here is a screenshot (I've built it by taking a screenshot of Adobe Acrobat Reader displaying my PDF, and taking a screenshot of my program displaying the buffered image; and then copy/paste the portion of the pdf screenshot below the rectangle of my program screenshot into MSPaint. To have the same rectangles size, I have to display the pdf in Adobe in 65.5% of the original size):
So we can see that the font used in java en adobe to display the text is the same. But the text seems a little bigger into Adobe. In fact if I superimpose two words (one from java on top of one from adobe) it seems that the word spacing is the same, the letter spacing too, but some letters have 1 pixel width diff.
Why?
What can I do to sovle this? I'v tried to play (in pdf) with character spacing (Tc operator), word spacing (Tw operator), horizontal scaling (Tz operator); I think it can "work"; but why is not the same scaling/spacing/... in both program? Theses (default) parameters are not part of the Font file (which is a true type one)? And how to retrieve them correctly (without putting into my java code the parameter manually)?
Thanks
EDIT
So, as you've both explained, I'm investigating to not use pdf base fonts to be sure that the same font (ttf file) is used by Java and Adobe Reader. But I'm stil have one problem (the same?).
In PDF output, I'm generating the font like that:
31 0 obj <<
/Type /Font
/FirstChar 0
/LastChar 255
/Widths[1298 ... 646]
/Name /F7
/Encoding /WinAnsiEncoding
/Subtype /TrueType /BaseFont /Tahoma /FontDescriptor 32 0 R
>>
endobj
32 0 obj <<
/Type /FontDescriptor
/Ascent 1299
/CapHeight 1298
/Descent -269
/Flags 32
/FontBBox [0 -269 2012 1299]
/FontName /Tahoma
/ItalicAngle 0
/StemV 126
/XHeight 1298
>>
endobj
If I have understand the specification correctly, all number (widths, ascent, descent,...) are relative to glyph unit (1em based?), where 1em = 1000 (and 1em is the width of the M character).
So to generate all theses parameters from java, I first try to find the correct java font size to have the width of the M character to be equal to 1000 (because Java does not give access to theses parameters in Font class or other classes; and PDF needs it even if theses informations are into the ttf file??).
float size = 1f;
while (true) {
font = font.deriveFont(size);
fm = g2d.getFontMetrics(font);
int em = fm.charWidth('M');
if (em >= 1000)
break ;
size += 1;
}
And then I can generate all requiered parameters. By example, for the Widths array (which is the width of each character) :
String pdfWidths = "";
for (int i = 0; i <= 255; ++i) {
int width = fm.charWidth(i);
pdfWidths += width + " ";
}
But doing this, I still have my text overlapping the rectangle in Adobe Viewer.
So I have to set my EM limit (into my brute force loop) to 780 for Tahoma font; to 850 for Verdana font;... to have similar text displayed (not exactly the same, but it's due, perhaps, to the anti aliasing algorithm?) (see the screenshot below). So it's not a constant "limit" (to must be theorically equal to 1000), but a variable limit... is that correct? (I think no) If yes, how to find this limit? If no, what is wrong?
Thanks again.
EDIT
Simply setting font size to 1000 and without bruteforcing to found the EM/Line height size, the result in pdf is really to java.
font = font.deriveFont(1000f);
fm = g2d.getFontMetrics(font);
//Retrieve Widths attribute
_pdfWidths = "";
for (int i = _firstChar; i <= _lastChar; ++i) {
int width = fm.charWidth(i);
_pdfWidths += width + " ";
}
But there is still a little difference, maybe it is due to the text drawing algorigthm (kerning maybe differ from java and adobe reader?). See image below, we can see, with Verdana, that the text is a little bit smaller (in width) in pdf than in java.
This answer essentially is a roundup of my comments.
The first attempt which involved using the font "Time New Roman" (actually Times-Roman) which is one of the PDF base font (not to compute and output all the font metrics into the pdf) for the PDF and "TimesRoman" for Java AWT, resulted in
Essentially: your app uses what the Java AWT considers TimesRoman
plain at 16pt applying font metrics in its own manner; your PDF viewer uses what it considers Times-Roman
at 16 user space units applying font metrics as specified in the PDF spec. All you can expect is some similarity (otherwise one of those contexts would have made a very bad choice) but not at all identity.
David actually explained that in more detail in item 1 (different fonts) and item 3 (different application of kerning and substitutions) in his answer.
Furthermore,
BTW: Beginning with PDF 1.5, the special treatment given to the standard 14 fonts is deprecated. (section 9.6.2.1 in ISO 32000-1). Thus by not including the font metrics explicitly in the PDF, you do something that has been deprecated for many many years.
The next attempt which involved not using pdf base fonts to be sure that the same font (ttf file) is used by Java and Adobe Reader, required calculation of character widths to embed in the PDF. In this context the assumption was made that all number (widths, ascent, descent,...) are relative to glyph unit (1em based?), where 1em = 1000 (and 1em is the width of the M character). Consequentially it was attempted to find the correct java font size to have the width of the M character to be equal to 1000 and then generate all requiered parameters from that font.
no, not em-based, but instead: A font defines the glyphs at one standard size. This standard is arranged so that the nominal height of tightly spaced lines of text is 1 unit. Thus, 1000 glyph space units are the height of that nominal line.
This led to the question what exactly is that "nominal line". Fortunately it is easier to approach this the other way around: A font at size 1 by definition is a font for which that "nominal line" has a height of 1. Thus,
shouldn't the Widths array be be filled with 1000 * fm.charWidth(i)
where fm
are the metrics of the font at size 1? Or, as AWT works with int widths, with fm.charWidth(i)
where fm
are the metrics of the font at size 1000?
Taking this into account, simply setting font size to 1000 and without bruteforcing to found the EM/Line height size, the result in pdf is really to java. But there is still a little difference, maybe it is due to the text drawing algorigthm (kerning maybe differ from java and adobe reader?). See image below, we can see, with Verdana, that the text is a little bit smaller (in width) in pdf than in java.
Have a look at the FontMetrics.charWidth
method comment: Note that the advance of a String is not necessarily the sum of the advances of its characters. AWT additionally applies kerning etc resulting in slight deviations. In a PDF, though, using a single Tj operation, those advances do add up.
If you want to use kerning in PDFs, you have to explicitly write those deviations from the standard widths. Here the TJ operator is quite handy allowing a mixed array of Strings and offsets as parameter.
If you want to substitute some characters by e.g. ligatures, you also have to do that yourself
There are a number of possible explanations for this, all contributing to the fact that using the standard 14 fonts as defined in PDF is perhaps legal but generally not a smart thing to do. It introduces the kind of ambiguities you run into. PDF generally was designed to avoid such ambiguities; in that sense allowing non-embedded and not properly specified fonts was a bad idea.
If you look closely at the character shapes in your text, I might venture to say you're actually looking at different fonts. Similar, yet different. Look at the "i" for example and how much higher the dot on the "i" is in one case. The reason for this could be that Adobe Reader has it's own set of fonts and doesn't use the system fonts (such as Java probably does). Think about it - how else could Adobe Reader always display those fonts properly, regardless of the system it runs on?
It might actually be worse. If I search through my Adobe Reader installation I don't find the Times font (not "Times New Roman" such as you say, that is a different font). So it could well be that Adobe Reader uses a different font to mimic Times (and some of the other base 14 fonts). I'm not 100% sure of this but I do not that Acrobat and Reader used to use MultiMaster fonts to emulate non-embedded fonts.
Additionally the way you render your text in PDF does not use inter-character kerning, while it might well be that Java is smart enough to apply some additional kerning or to use character substitutions (such as using one glyph to represent the combination "ffl" instead of three individual characters). PDF is capable of using kerning and those special glyphs, but you'll have to do the work to make sure they are used...
If you want to be absolutely sure your PDF looks exactly the same as your Java rendering, figure out what the character positions are in Java. Then write your PDF file in such a way that each character gets positioned at the exact same position...