Cannot get Servlet to process request content as U

2019-08-10 01:07发布

问题:

I'm converting a legacy app from ISO-8859-1 to UTF-8, and I've used a number of resources to determine what I need to set to get this to work. However, after several configuration, code, and environment changes, my Servlet (in Tomcat 5) doesn't seem to process submitted HTML form content as UTF-8.

Here's what I've set up for configuration.

  • System properties
[user@server ~]$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
  • tomcat5 server.xml
<Connector protocol="HTTP/1.1"
    ...
    URIEncoding="UTF-8"
    useBodyEncodingForURI="true"/>
  • JSP file
<%@ page language="java" pageEncoding="UTF-8" contentType="text/html;charset=UTF-8" %>
...
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
  • Servlet filter
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
{
    if(request.getCharacterEncoding() == null)
    {
        request.setCharacterEncoding("UTF-8");
    }
    ...

With some debug logs I know the following:

System.getProperty("file.encoding"): "UTF-8"
java.nio.charset.Charset.defaultCharset(): "UTF-8"
new OutputStreamWriter(new ByteArrayOutputStream()).getEncoding(): "UTF8"

However, when I submit my form with an input containing "Бить баклуши", I see the following (from my logs):

request.getParameter("myParameter") = Ð\221иÑ\202Ñ\214 баклÑ\203Ñ\210Ð

I know that the request content type was null, so it was explicitly set to "UTF-8" in my servlet filter. Also, I'm viewing my logs from a terminal, whose encoding I know is set to UTF-8 as well.

What am I missing here? What else do I need to set for the Servlet to correctly process my input as UTF-8? If more information will help, I'll be glad to add more debugging and update this question with it.

Edit:

  • I'm not using Windows Terminal (I'm using PuTTY), so I'm pretty certain the problem is not what I'm viewing the logs with. This is seconded by the fact that when I send my response back to the browser with the submitted content and output it, it's the same garbage as above.
  • The form's being submitted from IE8.

Solution:

My web.xml definition for my CharsetFilter was too far down (below my servlet configurations and other filters). I moved the filter definition to the very top of the web.xml document and everything worked correctly. See the accepted answer below.

回答1:

Edit4 (the final and corrected answer as requested)

Your servlet filter gets applied too late.

A possible proper order would be in web.xml as follows

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE web-app
    PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
    "http://java.sun.com/j2ee/dtds/web-app_2.3.dtd">

<web-app>
    <!--CharsetFilter start--> 
    <filter>
        <filter-name>Charset Filter</filter-name>
        <filter-class>CharsetFilter</filter-class>
        <init-param>
            <param-name>requestEncoding</param-name>
            <param-value>UTF-8</param-value>
        </init-param>
    </filter>
    <!-- The rest is ommited -->


回答2:

At first I thought the issue would get settled easily but it took me 2 days to figure it out. Here is my finding and I hope it helps 1) You need to have below code in your JSP

<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>

if you have many JPS pages then you can use below code in web.xml as explained here: How can I cleanly set the pageEncoding of all my JSPs?

2) Be sure before you read any parameter in your servlet, you have already set character encoding to UTF-8

request.setCharacterEncoding("UTF-8");

I have done it in my own filter (first filter before chain.doFilter.

3) Your database must support UTF-8 so be sure you have already applied the changes to your table and columns. To be sure it works fine just type in some words in Japanese and save. If the table holds the content then that is fine.

4) The last and most important one is the connection string to your database. Even though all my DB and tables were supporting the UTF8 but this extra line was the reason I could save my content into the database. So be sure you add characterEncoding=UTF8 to your connection string like below

jdbc:mysql://127.0.0.1:3306/my_daabase?characterEncoding=UTF8

For JSP pages with enctype="multipart/form-data" you will need to do one extra step. When you read a FileItem by getString method be sure you change it to getString("UTF-8") then that should do fine.