Problems with Chinese Fonts in iText-PDF on Window

2020-03-01 01:34发布

问题:

I'm using a Ubuntu-PC to create PDFs with iText which are partly in Chinese. To read them I use Evince. So far there were hardly any problems

On my PC I tried the following three BaseFonts and they worked with success:

bf = BaseFont.createFont("MSungStd-Light", "UniCNS-UCS2-H", BaseFont.NOT_EMBEDDED); 
bf = BaseFont.createFont("STSong-Light", "UniGB-UCS2-H", BaseFont.NOT_EMBEDDED); 
bf = BaseFont.createFont("MSung-Light","UniCNS-UCS2-H", BaseFont.NOT_EMBEDDED); 

Unfortunately in the moment the final PDF is opened on Windows with the Acrobat-Reader the document can't be displayed correctly any more.

After I googled the Fonts to get a solution I came to that Forum where the problem is explained in an understandable way (Here MSung-Light was used): http://community.jaspersoft.com/questions/531457/chinese-font-cannot-be-seen

You are using a built-in Chinese font in PDF. I'm not sure about the ability of this font to support both English and Chinese, or mixed language anyway.

The advantage of using an Acrobat Reader built-in font is that it produces smaller PDF files, because it relies on those fonts being available on the client machine that display the PDF, through the pre-installed Acribat Asian Font Pack.

However, using the PDF built-in fonts has some disadvantages that were discovered through testing on different machines, when we investegated a similar problem related to a built-in Korean font.

What should I do about it? It's not so important to be able to copy the Chinese letters. Can iText convert a paragraph to an image? Or are there any better solutions?

回答1:

You're using a CJK font. CJK fonts are never embedded and they require a font pack when opening such a file in Adobe Reader. Normally, Adobe Reader will ask you if you want to install such a font pack automatically. If it doesn't, you can download the appropriate font pack here.

It seems that you want to avoid having an end user install a font pack. That's understandable to some extent. What is really bad, is your suggestion to avoid using a font and to draw the glyphs one by one instead. This is possible with iText (and documented in my book), but it comes with a severe warning: Don't do this! Your file will be bloated and print results risk being awful!

An alternative is to use another font, e.g. arialuni.ttf, YaHei, SimHei,... These fonts contain Chinese glyphs and you can embed a subset of these fonts into your PDF (embedding the whole font would be overkill). See for instance the FontTest example.

If you have a font program such as arialuni.ttf, you can use this code to create a BaseFont object:

BaseFont.createFont("c:/windows/fonts/arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);

With this font, you can display Chinese characters that will be visible using any viewer on any OS. If you don't have arialuni.ttf, you need to look for another font and use the FontText example to test if Chinese is supported (if you don't see any text after "Chinese:", then Chinese isn't supported).

Extra answer in reply to your comment:

Please forget about iText-Asian as that is a jar you need when you want to use CJK fonts. You explicitly say you don't want to use CJK fonts, so you don't need to use iText-Asian.

If you want to embed the font (as opposed to rely on a font pack), you need to pick a font program that knows how to draw Chinese characters. This immediately makes your question regarding "Can you point me to an example that draws Chinese characters?" void. I could point you to such an example, but you'd still need a font program.

Once you have that font program: why wouldn't you use it the correct way? You should use that font program the way you're supposed to use it. You shouldn't use that font program to draw your glyphs as images as that would result in a PDF file with a huge filesize and a bad resolution (bad quality of the glyphs because you draw each separate character instead of using the font program in the PDF).

Did you look for a font program yet? There was a similar question about Vietnamese fonts a while ago: Can't export Vietnamese characters to PDF using iText It took me less than a quarter of my time to Google for a font that could be used. Why don't you spend a quarter of your time finding a font that supports Chinese?

Extra answer in reply to your extra comment:

  1. When we refer to CJK, we refer to a specific approach in which fonts aren't embedded, but rely on a font pack being installed on the end users machine, so that Adobe Reader can use that font. You don't want this so all your questions about using the itext asian jar and MSung-Light and so on are irrelevant.
  2. The Chinese character set is huge and many computers ship without any Chinese fonts (especially in the US), so the answer to your question "Isn't there any way to use a built-in arialuni" is "No, you shouldn't count on that!"
  3. What you say about Vietnamese is irrelevant. A font is a font is a font. You have a character code on one side and a glyph on the other side. The glue that connects one with the other is the encoding. For instance: You have the hexadecimal character code B2E2 and the hexadecimal character code CAD4. If the encoding is GBK, the corresponding glyphs are 测 and 试. Note that when you'd want to represent the very same characters in UNICODE, you'd use the characters 6D4D and 8BD5. There is very little difference with other systems. For instance: you have the hexadecimal character code 41 (65 in decimals) and if the encoding is Latin-1, the corresponding glyph is A.
  4. I have asked you to search for a font that supports Chinese. I have opened Google and I searched for the keywords "Chinese fonts". I found this page: http://www.freechinesefont.com/ and I picked a font that seemed OK to me: http://www.freechinesefont.com/simplified-hxb-mei-xin-download/

Now I use this code snippet:

import java.io.FileOutputStream;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Font;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.BaseFont;
import com.itextpdf.text.pdf.PdfWriter;

public class ChineseTest {
    /** Path to the resulting PDF file. */
    public static final String DEST = "results/test.pdf";
    /** Path to the vietnamese font. */
    public static final String FONT = "resources/hxb-meixinti.ttf";

    /**
     * Creates a PDF file: hello.pdf
     * @param    args    no arguments needed
     */
    public static void main(String[] args) throws DocumentException, IOException {
        new ChineseTest().createPdf(DEST);
    }

    /**
     * Creates a PDF document.
     * @param filename the path to the new PDF document
     * @throws    DocumentException 
     * @throws    IOException 
     */
    public void createPdf(String filename) throws DocumentException, IOException {
        // step 1
        Document document = new Document();
        // step 2
        PdfWriter.getInstance(document, new FileOutputStream(filename));
        // step 3
        document.open();
        BaseFont bf = BaseFont.createFont(FONT, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
        Font font = new Font(bf,15);
        // step 4
        document.add(new Paragraph("\u6d4b\u8bd5", font));
        // step 5
        document.close();
    }
}

The result looks like this on Windows:

How is this different from Vietnamese? The word test is displayed correctly in Chinese. A subset of the font is embedded, which means you can keep the file size low. The text is not embedded as an image which means the quality of the text is excellent.

Extra answer in answer to your extra comment: In your comment, you claim that the example that uses the file hxb-meixinti.ttf requires the installation of a font. That is incorrect. hxb-meixinti.ttf is merely a file that is read by iText and used to embed the definition of specific glyphs (a subset of the font) into a PDF.

When you write: Related to a Font-Program: Java seems to be able to do it without using external software. Java is able to use fonts because Java uses font files, just the same way as iText uses font files.

For more info, read Supported Fonts in the Java manual. I quote:

Physical fonts need to be installed in locations known to the Java runtime environment. The JRE looks in two locations: the lib/fonts directory within the JRE itself, and the normal font location(s) defined by the host operating system. If fonts with the same name exist in both locations, the one in the lib/fonts directory is used.

What I tried explaining (and what you have been ignoring since the start of this thread) is that iText needs access to a physical font. iText can accept a font from file or as a byte[], but you need to provide something like a TTF, OTF, TTC, AFM+PFB. This is not different from how Java works.

In your comment you also say that you want Adobe Reader to accept a byte stream instead of reading a PDF from file. This is not possible. Adobe Reader always requires the presence of the PDF file on disk. Even if the PDF file is served by a browser, the bytes of the PDF are stored as a temporary file. This is inherent to your request that the file needs to be viewed in Adobe Reader.

The rest of your comment is unclear. What do you mean by If everyone would just upload anything he might need a switch causes difficulties. Are you talking about downloading instead of uploading? Also: I gave you a solution that doesn't require downloading anything extra on the client side, yet you keep on nagging that no one will install anything on Acrobat.

As for your remark For BS I got a solution recently, I have no idea what you mean by BS.