Set encoding when converting text file to pdf usin

2020-04-07 20:21发布

问题:

I'm working on getting itext to output my UTF-8 encoded text correctly in fact the input file contains symbols like ° and Latin caracters (é,è,à...) .

But i didn't find a solution this is the code i'm using :

BufferedReader input = null;
Document output = null;
System.out.println("Convert text file to pdf");
System.out.println("input  : " + args[0]);
System.out.println("output : " + args[1]);
try {
  // text file to convert to pdf as args[0]
  input = 
    new BufferedReader (new FileReader(args[0]));
  // letter 8.5x11
  //    see com.lowagie.text.PageSize for a complete list of page-size constants.
  output = new Document(PageSize.LETTER, 40, 40, 40, 40);
  // pdf file as args[1]
  PdfWriter.getInstance(output, new FileOutputStream (args[1]));

  output.open();
  output.addAuthor("RealHowTo");
  output.addSubject(args[0]);
  output.addTitle(args[0]);

  BaseFont courier = BaseFont.createFont(BaseFont.COURIER, BaseFont.CP1252, BaseFont.EMBEDDED);
  Font font = new Font(courier, 12, Font.NORMAL);
  Chunk chunk = new Chunk("",font);
  output.add(chunk); 

  String line = "";
  while(null != (line = input.readLine())) {
    System.out.println(line);
    Paragraph p = new Paragraph(line);
    p.setAlignment(Element.ALIGN_JUSTIFIED);
    output.add(p);
  }
  System.out.println("Done.");
  output.close();
  input.close();
  System.exit(0);
}
catch (Exception e) {
  e.printStackTrace();
  System.exit(1);
}
}

Any idea will be appreciated.

回答1:

When I look at your code, I see a number of things that are odd.

  1. You say you require UTF-8, but you create a BaseFont object using BaseFont.CP1252 instead of BaseFont.IDENTITY_H (which is the "encoding" you need when you work with Unicode).
  2. You use the standard Type 1 font Courier, which is a font that doesn't know how to render é,è,à... and a font that is never embedded. As documented, the BaseFont.EMBEDDED parameter is ignored in this case!
  3. You don't use this font with an object that has actual content. The actual content is put into a Paragraph that is created using the default font "Helvetica", a font that doesn't know how to render é,è,à...

To solve this, you need to create the Paragraph with the appropriate font. That is NOT a standard type 1 font, but something like courier.ttf. You also need to use the appropriate encoding: BaseFont.IDENTITY_H.



回答2:

Both the reader and the writer should be set to use UTF-8 character set encoding to read/write UTF-8 characters properly. For example,

input = new BufferedReader(new InputStreamReader(args[0], "UTF-8"));


回答3:

@AmiraGL,

The solution proposed by Bruno Lowagie corrected this(p:dataExporter PDF export does not show Euro (€) sign) my problem. It may be that also solves your.

To solve this, you need to create the Paragraph with the appropriate font. That is NOT a standard type 1 font, but something like courier.ttf. You also need to use the appropriate encoding: BaseFont.IDENTITY_H. -by Bruno Lowagie

BaseFont courier = BaseFont.createFont(BaseFont.COURIER, BaseFont.CP1252, BaseFont.EMBEDDED);
Font cellFont = new Font(courier, 12, Font.NORMAL);

Solution: https://stackoverflow.com/a/21259711/3557631



标签: java itext