Servlet gets weird character with US International

2019-03-30 23:47发布

问题:

I have a simple form where I can type some characters. These characters are sent to a servlet which does a getBytes and print the bytes. The correct UTF-8 bytes for a "ã" are -61 and -93, but I get -52 and -93. :(

I tried everything to understand and fix this, but nothing worked. Everything on my machine should be UTF-8 so I suspect it has to do with the US International keyboard I have been using for 20 years.

Does any smart soul have a clue from where -52 and -93 are coming from?

FIXED on Jetty: See my answer below.

BROKEN on Tomcat: How to get tomcat to understand MacRoman (x-mac-roman) charset from my Mac keyboard?

回答1:

That is the Mac OS Roman character encoding. (0xBB == -52.)

Some things to check:

  • getBytes(string, "UTF-8") and new String(bytes, "UTF-8").
  • The form should have been sent in UTF-8: response.setContentType("text/html; charset="UTF-8");. In a JSP <%@page pageEncoding="UTF-8"%>
  • <form action="..." accept-charset="UTF-8">

As all that did not help:

Set the request filtering in your web application (web-xml).


Encoding in pom.xml:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-compiler-plugin</artifactId>
    <version>...</version>
    <configuration>
        <source>1.6</source>
        <target>1.6</target>
        <encoding>${project.build.sourceEncoding}</encoding>
    </configuration>
</plugin>
<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-resources-plugin</artifactId>
    <version>...</version>
    <configuration>
        <encoding>${project.build.sourceEncoding}</encoding>
    </configuration>
</plugin>
...
<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>


回答2:

Ok, after a good 8 hours (serious!) it looks like the only way to get this working correctly is to do:

One of the problems was: bad maven build encoding compilation of class files.

export JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8
mvn clean install

AND:

   <%@page pageEncoding="UTF-8" %>

NOW:

There is no way knowable to pass the latter option in your pom.xml.

Here is a pending answer for that: enabling UTF-8 encoding for clojure source files