Set encoding when converting text file to pdf usin

I'm working on getting itext to output my UTF-8 encoded text correctly in fact the input file contains symbols like ° and Latin caracters (é,è,à...) .

But i didn't find a solution this is the code i'm using :

BufferedReader input = null;
Document output = null;
System.out.println("Convert text file to pdf");
System.out.println("input  : " + args[0]);
System.out.println("output : " + args[1]);
try {
  // text file to convert to pdf as args[0]
  input = 
    new BufferedReader (new FileReader(args[0]));
  // letter 8.5x11
  //    see com.lowagie.text.PageSize for a complete list of page-size constants.
  output = new Document(PageSize.LETTER, 40, 40, 40, 40);
  // pdf file as args[1]
  PdfWriter.getInstance(output, new FileOutputStream (args[1]));

  output.open();
  output.addAuthor("RealHowTo");
  output.addSubject(args[0]);
  output.addTitle(args[0]);

  BaseFont courier = BaseFont.createFont(BaseFont.COURIER, BaseFont.CP1252, BaseFont.EMBEDDED);
  Font font = new Font(courier, 12, Font.NORMAL);
  Chunk chunk = new Chunk("",font);
  output.add(chunk); 

  String line = "";
  while(null != (line = input.readLine())) {
    System.out.println(line);
    Paragraph p = new Paragraph(line);
    p.setAlignment(Element.ALIGN_JUSTIFIED);
    output.add(p);
  }
  System.out.println("Done.");
  output.close();
  input.close();
  System.exit(0);
}
catch (Exception e) {
  e.printStackTrace();
  System.exit(1);
}
}

Any idea will be appreciated.

标签： java itext

3条回答

够拽才男人

2楼-- · 2020-04-07 20:11

When I look at your code, I see a number of things that are odd.

You say you require UTF-8, but you create a BaseFont object using BaseFont.CP1252 instead of BaseFont.IDENTITY_H (which is the "encoding" you need when you work with Unicode).
You use the standard Type 1 font Courier, which is a font that doesn't know how to render é,è,à... and a font that is never embedded. As documented, the BaseFont.EMBEDDED parameter is ignored in this case!
You don't use this font with an object that has actual content. The actual content is put into a Paragraph that is created using the default font "Helvetica", a font that doesn't know how to render é,è,à...

To solve this, you need to create the Paragraph with the appropriate font. That is NOT a standard type 1 font, but something like courier.ttf. You also need to use the appropriate encoding: BaseFont.IDENTITY_H.

0人赞添加讨论(0) 举报

不美不萌又怎样

3楼-- · 2020-04-07 20:25

Both the reader and the writer should be set to use UTF-8 character set encoding to read/write UTF-8 characters properly. For example,

input = new BufferedReader(new InputStreamReader(args[0], "UTF-8"));

0人赞添加讨论(0) 举报

Root（大扎）

4楼-- · 2020-04-07 20:31

@AmiraGL,

The solution proposed by Bruno Lowagie corrected this(p:dataExporter PDF export does not show Euro (€) sign) my problem. It may be that also solves your.

To solve this, you need to create the Paragraph with the appropriate font. That is NOT a standard type 1 font, but something like courier.ttf. You also need to use the appropriate encoding: BaseFont.IDENTITY_H. -by Bruno Lowagie

BaseFont courier = BaseFont.createFont(BaseFont.COURIER, BaseFont.CP1252, BaseFont.EMBEDDED);
Font cellFont = new Font(courier, 12, Font.NORMAL);

Solution: https://stackoverflow.com/a/21259711/3557631

0人赞添加讨论(0) 举报

Set encoding when converting text file to pdf usin

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间