UTF-8 decoding problems in Java & Tomcat7

2019-08-06 12:39发布

问题:

I'm sending an AJAX request to the server, where the param value is encoded in the "escape(...)" function.

The Tomcat server (7.0.42) is configured s.t. the receiving Connector has a URIEncoding="UTF-8", in web.xml I have configured the SetCharacterEncodingFilter as follows:

<filter>
    <filter-name>charencode</filter-name>
    <filter-class>
        org.apache.catalina.filters.SetCharacterEncodingFilter
    </filter-class>
    <init-param>
        <param-name>encoding</param-name>
        <param-value>UTF-8</param-value>
    </init-param>
</filter>
<filter-mapping>
    <filter-name>charencode</filter-name>
    <url-pattern>*</url-pattern>
</filter-mapping>

, and additionally I have created a filter to encode the response as UTF-8:

@Override
public void doFilter(ServletRequest arg0, ServletResponse arg1, FilterChain arg2) throws IOException, ServletException {
    arg1.setCharacterEncoding("UTF-8");
    arg2.doFilter(arg0, arg1);
}

There is no issue parsing params that come from the Latin charset, but when I tried Russian, request.getParameter(..) returns null. Additionally, I get this in the logs (suspect it's coming from the SetCharacterEncodingFilter):

INFO: Character decoding failed. Parameter [usersaid] with value [%u044B%u0432%u0430%u044B%u0432%u0430%u044B%u0432%u044B%u0432%u0430%u044B%u0432%u0430%21] has been ignored. Note that the name and value quoted here may be corrupted due to the failed decoding. Use debug level logging to see the original, non-corrupted values.

And there is no DEBUG-level messages to follow (my logger is set up right I believe..)

Could you please advise? Will be happy to answer questions!

Many thanks, Victor.

回答1:

That string doesn't decode. Nothing to do with your application server. Try these tools to see for your self:

http://www.albionresearch.com/misc/urlencode.php http://meyerweb.com/eric/tools/dencoder/

So, the error looks like it might be client side. Make sure you set the encoding correctly when urlencoding. You are probably using something else that UTF-8, which is what you should use.

Here's a thread on correctly encoding unicode characters: What is the proper way to URL encode Unicode characters?