HTML forms: issues combining charset with enctype

2019-07-16 11:27发布

I have a Web site with a message board. The board lets people post messages and include attachments. I had a problem where my site was hiccuping every time someone wrote a post with non-Unicode characters. In an effort to solve it, I changed my HTML form code from

enctype="multipart/form-data"

(as I'm accepting file uploads) to:

enctype="multipart/form-data;charset=UTF-8"

This solved the character problem. But it broke the file upload capability in Firefox 2 through 3.5. Firefox accepts all the text that the user submits, but not the file data. It acts totally like it should, but as if there was no file submitted. Everything works fine in Safari.

I also tried

enctype="multipart/form-data" accept-charset="UTF-8"

...but that had no effect on the character problem.

Any ideas for ways around this?

2条回答
SAY GOODBYE
2楼-- · 2019-07-16 11:39

The problem is not the form data, but the filename field - which simply does not work if you need utf-8 and file data, so if you need to process the filename on the server, which is common, you are messed up.

If you set enctype="multipart/form-data;charset=UTF-8" in your form, Tomcat 6 converts this to: content type: application/x-www-form-urlencoded, which is the problem.

It has taken me ages to track this down, but it looks like it is broken in general, and I have tested this with HTTP requests from web browser, and also .Net, with same effect.

查看更多
兄弟一词,经得起流年.
3楼-- · 2019-07-16 11:44

charset is not a registered parameter for the multipart/form-data media type. It shouldn't do anything.

According to RFC2388, the charset of the submitted fields should actually be passed by the browser in a Content-Type header of the form-data subpart. In practice no browser does this.

accept-charset can't be used because it's broken in IE: instead of choosing the charset for the submission it actually specifies an alternative charset to use, on a per-field basis, when characters do not fit in the primary charset (which is the charset of the current page). This effectively mangles your strings as you cannot find out which charset IE actually used.

The only effective way to make browsers submit your forms as UTF-8 is to serve the page containing the form as UTF-8, by setting a Content-Type: text/html;charset=utf-8 header, including a <meta> HTTP-equivalent, or both (can be a good idea if the user saves the page to disc, losing the header information).

查看更多
登录 后发表回答