I faced a parameter encoding issue when submitting a form with the get method (I can't use the post method). Some accentuated characters were not escaped in the URL, since my page was UTF8. The Spring controller retrieved bad characters instead.
I solved this issue by setting accept-charset="ISO-8859-1"
on my form, but now, I am wondering which charset is safe for all server/browser combination. Is there any recommended for my forms and 'get' URLs?
The problem is URL's always get encoded as 127-ASCII. Because your form sends back additional characters values outside the standard ASCII set via a GET you have several issues going on:
I recommend you remove the form encoding, use the pages UTF-8 settings for broader character support, and drop in these two metatags below to make sure you are sending back UTF-8 encoded data, which includes all the characters needed and is easily decoded on the server as described above by other posters above.
nickdos is right. Another way of doing this is using the meta-data tag:
Also keep in mind when handling the response on the server, the code should also use the correct (same) encoding.
Example:
use
stringParamer.getBytes("utf-8") instead of stringParamer.getBytes()
And when using Spring make sure the correct encoding is configured for message converters in the DispatcherServlet's configuration file (XYZ_-servlet.xml), e.g.:
This is frustrating (to put it mildly) with servlets. The standard URL encoding must use UTF-8 yet servlets not only default to ISO-8859-1 but don't offer any way to change that with code.
Sure you can
req.setRequestEncoding("UTF-8")
before you read anything, but for some ungodly reason this only affects request body, not query string parameters. There is nothing in the servlet request interface to specify the encoding used for query string parameters.Using
ISO-8859-1
in your form is a hack. Using this ancient encoding will cause more problems than solve for sure. Especially since browsers do not support ISO-8859-1 and always treat it as Windows-1252. Whereas servlets treat ISO-8859-1 as ISO-8859-1, so you will be screwed beyond belief if you go with this.To change this in Tomcat for example, you can use the
URIEncoding
attribute in your<connector>
element:If you don't use a container that has these settings, can't change its settings or some other issue, you can still make it work because ISO-8859-1 decoding retains full information from the original binary.
So let's say
test=ä
and if everything is correctly set, the browser encodes it astest=%C3%A4
. Your servlet will incorrectly decode it as ISO-8859-1 and give you the resulting string"ä"
. If you apply the correction, you can getä
back: