I have a simple form where I can type some characters. These characters are sent to a servlet which does a getBytes and print the bytes. The correct UTF-8 bytes for a "ã" are -61 and -93, but I get -52 and -93. :(
I tried everything to understand and fix this, but nothing worked. Everything on my machine should be UTF-8 so I suspect it has to do with the US International keyboard I have been using for 20 years.
Does any smart soul have a clue from where -52 and -93 are coming from?
FIXED on Jetty: See my answer below.
BROKEN on Tomcat: How to get tomcat to understand MacRoman (x-mac-roman) charset from my Mac keyboard?
Ok, after a good 8 hours (serious!) it looks like the only way to get this working correctly is to do:
One of the problems was: bad maven build encoding compilation of class files.
AND:
NOW:
There is no way knowable to pass the latter option in your pom.xml.
Here is a pending answer for that: enabling UTF-8 encoding for clojure source files
That is the Mac OS Roman character encoding. (0xBB == -52.)
Some things to check:
getBytes(string, "UTF-8")
andnew String(bytes, "UTF-8")
.response.setContentType("text/html; charset="UTF-8");
. In a JSP<%@page pageEncoding="UTF-8"%>
<form action="..." accept-charset="UTF-8">
As all that did not help:
Set the request filtering in your web application (web-xml).
Encoding in pom.xml: