How to know cmap compatible with font in Itext?

2020-08-01 05:17发布

问题:

I check my font's encoding (the font's type is OpenType Font), the result is below :

PostScript name: HiraKakuProN-W3
Available code pages:
encoding[0] = 1252 Latin 1
encoding[1] = 1251 Cyrillic
encoding[2] = 1253 Greek
encoding[3] = 932 JIS/Japan

Then, creat font by code :

Font f = new Font(BaseFont.createFont("hirafont.otf", "Identity-V", BaseFont.EMBEDDED));

Except "Identity-V" and "Identity-H", i can't use other cmaps such as ("UniJISX0213-UTF32-H/V ..."). And in this font i see many glyphs which is displayed in rotation of 90 degrees. How to map char in unicode to char's glyph which is rotation in font?

Example : '〔 ' (0x3014 12308 LEFT TORTOISE SHELL BRACKET) map with index 9265 in font

---------update code--------

PdfEncodings.loadCmap("UniJISX0213-UTF32-V", PdfEncodings.CRLF_CID_NEWLINE);
String temp = "a";

byte[] text = temp.getBytes("Shift_JIS");
String cid = PdfEncodings.convertCmap("UniJISX0213-UTF32-V", text);
BaseFont bf = BaseFont.createFont("hiraginoFont.otf",BaseFont.IDENTITY_V, BaseFont.EMBEDDED);
Paragraph p = new Paragraph(cid, new Font(bf, 14));

回答1:

I had to dig into my first book to find a reference to what you're asking. A long time ago, we used to distribute the following jar: itext-asiancmaps.jar. This jar is not to be confused with itext-asian.jar which is used for CJK fonts.

It took me a while to find a copy of that jar. You'll find it in this old ZIP file: extrajars-2.1.zip

According to my first book (written in 2005-2006), you can use this jar for encodings that are not supported by the CMaps in itext-asian.jar

It is used like this:

PdfEncodings.loadCmap("GBK2K-H", PdfEncodings.CRLF_CID_NEWLINE);
byte[] text = my_GB_encoded_text;
String cid = PdfEncodings.convertCmap("GBK2K-H", text);
BaseFont bf = BaseFont.createFont("STSong-Light",
    BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
Paragraph p = new Paragraph(cid, new Font(bf, 14));
document.add(p);

As you know BaseFont.IDENTITY_H and BaseFont.IDENTITY_V are the horizontal and vertical identity mappings for 2-byte CIDs. The PdfEncodings class can convert a String in a specific encoding to a String with 2-byte CIDs. In this case, the original String was encoded in the GB 18030-2000; we needed to convert this to another String so that iText could use the IDENTITY_H encoding.

In my first book, I wrote: "I insert this sample for the sake of completeness. In the past three years, only a handful of people have posted questions about this." By the time I wrote my second book (2009-2010), nobody was asking about this functionality, so I dropped distributing the jar with the cmaps and the example didn't make the second book.

I am now reviving this old example in the hope that it is useful. I tested it once in 2005 or 2006, and it worked back then. I can't guarantee that it still works today.



标签: pdf itext