How can I get XSLT to return UTF-8 in Java

2020-06-27 09:32发布

问题:

I'm trying to get my XSL script to work with UTF-8 encoding. Characters like åäö and greek characters just turn up like garbage. The only way to get it to work is if I write the result to a file. If I write it to an output stream it only returns garbage (System.out works, but that might be because its beeing redirected to a file).

The result needs to be returned from a servlet, and please note that its not a servlet configuration issue. I can return a hard coded string with greek characters from the servlet and it works fine, so it's an issue with the transformation.

Here is my current (simplified) code.

protected void doGet(final HttpServletRequest request, final HttpServletResponse response) throws ServletException,
IOException {
    try {
        response.setCharacterEncoding("UTF-8");
        response.setContentType("text/html; charset=UTF-8");

        final TransformerFactory factory = this.getFactory();

        final File inFile = new File("infile.xml");
        final File xslFile = new File("template.xsl");
        final File outFile = new File("outfile.html");

        final Templates templates = factory.newTemplates(new StreamSource(xslFile));
        final Transformer transformer = templates.newTransformer();
        transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");

        final InputStream in = new FileInputStream(inFile);
        final StreamSource source = new StreamSource(in);

        final StreamResult result1 = new StreamResult(outFile);
        final StreamResult result2 = new StreamResult(System.out);
        final ByteArrayOutputStream out = new ByteArrayOutputStream();
        final StreamResult result3 = new StreamResult(out);

        //transformer.transform(source, result1);
        //transformer.transform(source, result2);
        transformer.transform(source, result3);

        final Writer writer = response.getWriter();
        writer.write(new String(out.toByteArray()));
        writer.close();
        in.close();

    } catch (final TransformerConfigurationException e) {
        e.printStackTrace();
    } catch (final TransformerException e) {
        e.printStackTrace();
    }
}

Also, my XSL script contains the following

<xsl:output method="html" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />

What is the correct way to get this to work? I'm using Saxon for the transformation if that might be of any help.

回答1:

This is almost certainly the problem:

writer.write(new String(out.toByteArray()));

You've carefully encoded your text as UTF-8, and then you're converting into a string using the platform default encoding. You should pretty much never use the String constructors and methods which use the platform default encoding. Even if you want to use that encoding, do so explicitly.

If you're going to write to a Writer anyway, why are you starting off writing to a ByteArrayOutputStream? Why not go straight to the Writer?

It would be better, however, to write straight to the response's output stream (response.getOutputStream()), and also set the response's content type to indicate that it's UTF-8.

Note that if you really want to get the result as a String beforehand, use StringWriter. There's no point in writing to a ByteArrayOutputStream and then converting to a string.