Arabic characters do not display correctly [duplic

2020-05-07 19:15发布

问题:

For my website, I use itextpdf 5.5.4 to generate PDF downloads. The website is meant for people who speak English. Recently, a user from Egypt used the site, entered some Arabic content, and contacted me with the problem he has.

This is his Arabic content shown correctly in the browser:

This is incorrect display in PDF:

Here is the Java code I have. Please note that it is actually able to generate PDF with Chinese characters CORRECTLY:

BASE_FONT base = BaseFont.createFont("/fonts/ARIALUNI.ttf", BaseFont.IDENTITY_H , BaseFont.EMBEDDED);                       
Font f = new Font(base, 10f);
String htmlString = string_with_Arabic_text;
Paragraph p = new Paragraph(htmlString, f); 
p.setSpacingBefore(20.0f);
p.setSpacingAfter(7.0f);
document.add(p);

How to fix the problem?

In Eclipse (the IDE I use), I am able to see Arabic characters display correctly in htmlString. At this moment, I cannot upgrade to use the latest version of itextpdf due to various project reasons.

回答1:

iText 5 has limited support for non-Western writing systems. It support right-to-left writing but only in the context of ColumnText and PdfPCell objects.

This is an iText 5 example with ColumnText where p contains text in Arabic:

ColumnText canvas = new ColumnText(writer.getDirectContent());
canvas.setSimpleColumn(36, 750, 559, 780);
canvas.setRunDirection(PdfWriter.RUN_DIRECTION_LTR);
canvas.addElement(p);
canvas.go();

This is an iText 5 example with PdfPCell where p contains text in Arabic:

PdfPCell cell = new PdfPCell(p);
cell.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);

This is very annoying, as it would mean that you have to rewrite your entire application so that all text is added either in a ColumnText or in a PdfPCell object. You'd also have to examine the content to check if you need to change the run direction.

As you have to rewrite the application anyway, it would be best to upgrade to iText 7, because iText 7 has an add-on that detects the writing system based on the UNICODE values of the content (see pdfCalligraph). When Arabic or Hebrew text is detected, the add-on changes the writing system for "left to right" to "right to left." See How to display Arabic strings from RTL in PDF generated using itext 7 API?

I see that you are coding your document. Please note that you can save yourself a lot of work by creating the content in HTML, and then converting it to PDF using the pdfHTML add-on. The PDF to HTML tutorial has some examples involving Arabic. See the section on internationalization in chapter 6, and the following FAQ entries:

  • Which languages are supported in pdfHTML?
  • How to convert HTML containing Arabic/Hebrew characters to PDF?

iText 7 is also the first version that supports more writing systems, such as Devanagari, Tamil, Telugu,... For more info, read the pdfCalligraph white paper.

Important: the pdfCalligraph add-on is closed source. You'll need a trial license to test it and a commercial license to use it in production. Note that the current version of iText that you are using is licensed as AGPL software, which implies that you can't use your project in a closed source context. You mention external users, which means that you are distributing your service. Did you open source all your own source code? If not, you should purchase a commercial license for your use of iText.