Jasperreport CSV UTF-8 without BOM instead of UTF-

2019-08-11 13:45发布

问题:

I try to export a CSV file with JasperReport, the problem is when I want to print currency like '€'.

When I search for solution, I realized that it's about file encoding! I write this code!

//JasperPrint is already filled

HttpServletResponse httpServletResponse = (HttpServletResponse) FacesContext.getCurrentInstance().getExternalContext().getResponse();
httpServletResponse.setContentType("application/csv; charset="+Charset.forName("utf-8").displayName());
httpServletResponse.setCharacterEncoding(Charset.forName("utf-8").displayName());
httpServletResponse.addHeader("Content-disposition", "attachment; filename=nameoffile.csv");
httpServletResponse.addHeader("Content-type", "application/csv; charset="+Charset.forName("utf-8").displayName());
ServletOutputStream servletOutputStream = httpServletResponse.getOutputStream();
JRCsvExporter exporter = new JRCsvExporter();

exporter.setParameter(JRExporterParameter.JASPER_PRINT, jasperPrint);
exporter.setParameter(JRExporterParameter.CHARACTER_ENCODING, Charset.forName("utf-8").displayName());
exporter.setParameter(JRExporterParameter.OUTPUT_STREAM, servletOutputStream);
exporter.setParameter(JRCsvExporterParameter.CHARACTER_ENCODING, Charset.forName("utf-8").displayName());
exporter.setParameter(JRCsvExporterParameter.FIELD_DELIMITER, ";");

The file exported by JasperReport is encoded on "UTF-8 without BOM". So when I open the file with Excel '€' looks like '¬â'. But When I open the file with Notepad++ '€' looks like '€'.

On Notepad++, I convert file encoding to UTF-8 (with BOM I think), the I save the file. I open the file with Excel and ---EUREKA---, '€' looks like '€'.

So the main question is how to encode file to "UTF-8 WITH BOM"?

UPDATE

I try this jrxml

<?xml version="1.0" encoding="UTF-8"?>
<jasperReport xmlns="http://jasperreports.sourceforge.net/jasperreports" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://jasperreports.sourceforge.net/jasperreports http://jasperreports.sourceforge.net/xsd/jasperreport.xsd" name="report2" language="groovy" pageWidth="100" pageHeight="842" columnWidth="100" leftMargin="0" rightMargin="0" topMargin="0" bottomMargin="0" uuid="b7ec44fd-90d0-4ecc-8f99-0e5eafc16828">
    <property name="ireport.zoom" value="1.0"/>
    <property name="ireport.x" value="0"/>
    <property name="ireport.y" value="0"/>
    <parameter name="toPrint" class="java.lang.String"/>
    <title>
        <band height="20" splitType="Stretch">
            <textField>
                <reportElement x="0" y="0" width="100" height="20" uuid="d2c55a11-b407-407b-b117-3b04d20cccec"/>
                <textFieldExpression><![CDATA[$P{toPrint}]]></textFieldExpression>
            </textField>
        </band>
    </title>
</jasperReport>

And I set toPrint = €€€ ££££ €£ for preview. PDF work fine but when I save file to CSV I see "€€€ ££££ €£"

回答1:

Instead of resolving your encode problem you could consider to use a different notation

Writing code that contains special characters, is normally bad practice (switching of encoding on output file or compiling code using a compiler that expects a different encoding ecc.), will corrupt the result

Any character encoded with UTF-8 can be represented using only its 4-digits hexadecimal code

The € is U+20AC

So instead of putting you can consider to put \u20AC in your jrxml code.

Example

<textField>
    <reportElement x="0" y="0" width="100" height="25" uuid="bc2ae040-f9af-4732-82fe-8fe8b71696bd"/>
    <textFieldExpression><![CDATA["\u20AC"]]></textFieldExpression>
</textField>

EDIT: After comment "but, the value i want to print is not a static value", convert the value to unicode:

Example of java code

public static String getAsUnicode(String value){
    if (value==null){
        return null;
    }
    String ret = "";
    for (char ch : value.toCharArray()) {
        ret += getUnicodeEscaped(ch);
    }
    return ret;
}

public static String getUnicodeEscaped(char ch) {
      if (ch < 0x10) {
          return "\\u000" + Integer.toHexString(ch);
      } else if (ch < 0x100) {
          return "\\u00" + Integer.toHexString(ch);
      } else if (ch < 0x1000) {
          return "\\u0" + Integer.toHexString(ch);
      }
      return "\\u" + Integer.toHexString(ch);
  }

and in jrxml call your method:

<textField>
    <reportElement x="0" y="0" width="100" height="25" uuid="bc2ae040-f9af-4732-82fe-8fe8b71696bd"/>
    <textFieldExpression><![CDATA[MyClass.getAsUnicode($P{toPrint})]]></textFieldExpression>
</textField>


回答2:

After Search, I found one solution : use cp1252 encoding, it take care the '€' symbol! so the final code is bellow!

//JasperPrint is already filled

HttpServletResponse httpServletResponse = (HttpServletResponse) FacesContext.getCurrentInstance().getExternalContext().getResponse();
httpServletResponse.setContentType("application/csv; charset=cp1252");
httpServletResponse.setCharacterEncoding("cp1252");
httpServletResponse.addHeader("Content-disposition", "attachment; filename=nameoffile.csv");
httpServletResponse.addHeader("Content-type", "application/csv; charset="+Charset.forName("utf-8").displayName());
ServletOutputStream servletOutputStream = httpServletResponse.getOutputStream();
JRCsvExporter exporter = new JRCsvExporter();

exporter.setParameter(JRExporterParameter.JASPER_PRINT, jasperPrint);
exporter.setParameter(JRExporterParameter.CHARACTER_ENCODING, "cp1252");
exporter.setParameter(JRExporterParameter.OUTPUT_STREAM, servletOutputStream);
exporter.setParameter(JRCsvExporterParameter.CHARACTER_ENCODING, "cp1252");
exporter.setParameter(JRCsvExporterParameter.FIELD_DELIMITER, ";");