I have a simple form where I can type some characters. These characters are sent to a servlet which does a getBytes and print the bytes. The correct UTF-8 bytes for a "ã" are -61 and -93, but I get -52 and -93. :(
I tried everything to understand and fix this, but nothing worked. Everything on my machine should be UTF-8 so I suspect it has to do with the US International keyboard I have been using for 20 years.
Does any smart soul have a clue from where -52 and -93 are coming from?
FIXED on Jetty: See my answer below.
BROKEN on Tomcat: How to get tomcat to understand MacRoman (x-mac-roman) charset from my Mac keyboard?
That is the Mac OS Roman character encoding. (0xBB == -52.)
Some things to check:
getBytes(string, "UTF-8")
and new String(bytes, "UTF-8")
.
- The form should have been sent in UTF-8:
response.setContentType("text/html; charset="UTF-8");
. In a JSP <%@page pageEncoding="UTF-8"%>
<form action="..." accept-charset="UTF-8">
As all that did not help:
Set the request filtering in your web application (web-xml).
Encoding in pom.xml:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>...</version>
<configuration>
<source>1.6</source>
<target>1.6</target>
<encoding>${project.build.sourceEncoding}</encoding>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-resources-plugin</artifactId>
<version>...</version>
<configuration>
<encoding>${project.build.sourceEncoding}</encoding>
</configuration>
</plugin>
...
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
Ok, after a good 8 hours (serious!) it looks like the only way to get this working correctly is to do:
One of the problems was: bad maven build encoding compilation of class files.
export JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8
mvn clean install
AND:
<%@page pageEncoding="UTF-8" %>
NOW:
There is no way knowable to pass the latter option in your pom.xml.
Here is a pending answer for that: enabling UTF-8 encoding for clojure source files