I have a Web site with a message board. The board lets people post messages and include attachments. I had a problem where my site was hiccuping every time someone wrote a post with non-Unicode characters. In an effort to solve it, I changed my HTML form code from
enctype="multipart/form-data"
(as I'm accepting file uploads) to:
enctype="multipart/form-data;charset=UTF-8"
This solved the character problem. But it broke the file upload capability in Firefox 2 through 3.5. Firefox accepts all the text that the user submits, but not the file data. It acts totally like it should, but as if there was no file submitted. Everything works fine in Safari.
I also tried
enctype="multipart/form-data" accept-charset="UTF-8"
...but that had no effect on the character problem.
Any ideas for ways around this?
The problem is not the form data, but the filename field - which simply does not work if you need
utf-8
and file data, so if you need to process the filename on the server, which is common, you are messed up.If you set
enctype="multipart/form-data;charset=UTF-8"
in your form, Tomcat 6 converts this to: content type:application/x-www-form-urlencoded
, which is the problem.It has taken me ages to track this down, but it looks like it is broken in general, and I have tested this with HTTP requests from web browser, and also .Net, with same effect.
charset
is not a registered parameter for themultipart/form-data
media type. It shouldn't do anything.According to
RFC2388
, the charset of the submitted fields should actually be passed by the browser in aContent-Type
header of the form-data subpart. In practice no browser does this.accept-charset
can't be used because it's broken in IE: instead of choosing the charset for the submission it actually specifies an alternative charset to use, on a per-field basis, when characters do not fit in the primary charset (which is the charset of the current page). This effectively mangles your strings as you cannot find out which charset IE actually used.The only effective way to make browsers submit your forms as UTF-8 is to serve the page containing the form as UTF-8, by setting a
Content-Type: text/html;charset=utf-8
header, including a<meta>
HTTP-equivalent, or both (can be a good idea if the user saves the page to disc, losing the header information).