I am doing some work for a French client and so need to deal with accented characters. But I'm running into a lot of difficulty, I am hoping the solution is simple and that somebody can point it out to me.
The string: La Forêt pour Témoin
is converted to: La For? pour T?oin
Note the missing character following the accented character - the t following the ê and the m following the é.
I have tried using StringEscapeUtils which was successful at escaping some characters, such as ă. I have also built my own escape function which produces the same results (ă will work, ê will not).
private String escapeChars(String string) {
char[] chars = string.toCharArray();
String result = "";
for (int i = 0; i < chars.length; i++) {
int c = chars[i];
result += "&#" + c + ";";
}
return result;
}
The project is running in eclipse using the App Engine plugin, I cannot narrow down whether the problem is caused by Java, App Engine, or SQLite.
Any help is appreciated.
EDIT: I have found that string are malformed when simply displaying the the request parameter from a form. (ie, request.getParameter("string") already has malformed content).
I have tried the meta-tag suggested by Daniel with no success. I think you are on the right track though, the header data of html document follows:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
When accented characters are hard-coded into a JSP they are displayed as intended.
EDIT: I have also added <?xml version="1.0" encoding="UTF-8"?>
to the very start of the page.
I am very close to a solution. I have found that if I change the encoding of the page from within the browser form data is passed to the server properly. I cannot figure out how to make the browser auto detect page encoding.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
RESOLVED: I couldn't work out how to make the browser auto-detect UTF-8 encoding which java defaults to. So I have forced character encoding to ISO-8859-1 using request.setCharacterEncoding("ISO-8859-1").
Okay, so the first problem is you need to find out where the data is being lost.
You haven't really said where things are going wrong, but I'd expect that if you sort out the character encoding, the rest should fall into place. Maybe SQLite has problems, but I doubt it...
This can have three causes:
It's a GET request and the server isn't configured to use UTF-8 to parse request URI. It's unclear which server you're using, so here's a Tomcat-targeted answer as example: set
URIEncoding
attribute of the HTTP Connector in/conf/server.xml
toUTF-8
.If it's a POST request, then you need to ensure that the servletcontainer uses UTF-8 to encode the request body. You can do that by
request.setCharacterEncoding("UTF-8")
beforehand.The console which you're writing the parameter to doesn't support UTF-8. It's unclear which console you're talking about, so here's an Eclipse-targeted answer as example: in Window > Preferences > General > Workspace > Text File Encoding set it to UTF-8.
See also:
Is it possible the string is in tact, but you are attempting to print these characters with a en-us localization?
You need to make sure that the HTML that is sent back to the browser has a charset. You should both send back
Content-Type: text/html; charset=UTF-8
as an HTTP response header and include, as the first child element of thehead
tag:Or, if you are using XHTML:
Though just having the
meta
tag will often fix the problem.Also, make sure that your HTML is valid by using the W3C Markup Validation Service.
See also: FAQ: Weird characters and question marks appear instead of accented characters