Detecting the character encoding of an HTTP POST r

2019-01-03 13:35发布

I'm building a web service and have a node that accepts a POST to create a new resource. The resource expects one of two content-types - an XML format I'll be defining, or form-encoded variables.

The idea is that consuming applications can POST XML directly and benefit from better validation etc., but there's also an HTML interface that will POST the form-encoded stuff. Obviously the XML format has a charset declaration, but I can't see how I detect the form's charset just from looking at the POST.

A typical post to the form from Firefox looks like this:

POST /path HTTP/1.1
Host: www.myhostname.com
User-Agent: Mozilla/5.0 [...etc...]
Accept: text/html,application/xhtml+xml, [...etc...]
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 41

field1=value1&field2=value2&field3=value3

Which doesn't seem to contain any useful indication of the character set.

From what I can see, the application/x-www-form-urlencoded type is entirely defined in HTML, which just lays out the %-encoding rules, but doesn't say anything about what charset the data should be in.

Basically, is there any way of telling the character set if I don't know the character set the HTML originally presented was? Otherwise I'll have to try and guess the character set based on what chars are present, and that's always a bit iffy from what I can tell.

3条回答
手持菜刀,她持情操
2楼-- · 2019-01-03 14:09

Try setting the charset on your Content-Type:

httpCon.setRequestProperty( "Content-Type", "multipart/form-data; charset=UTF-8; boundary=" + boundary );
查看更多
对你真心纯属浪费
3楼-- · 2019-01-03 14:18

The Charset used in the POST will match that of the Charset specified in the HTML hosting the form. Hence if your form is sent using UTF-8 encoding that is the encoding used for the posted content. The URL encoding is applied after the values are converted to the set of octets for the character encoding.

查看更多
可以哭但决不认输i
4楼-- · 2019-01-03 14:29

the default encoding of a HTTP POST is ISO-8859-1.

else you have to look at the Content-Type header that will then look like

Content-Type: application/x-www-form-urlencoded ; charset=UTF-8

You can maybe declare your form with

<form enctype="application/x-www-form-urlencoded;charset=UTF-8">

or

<form accept-charset="UTF-8">

to force the encoding.

Some references :

http://www.htmlhelp.com/reference/html40/forms/form.html

http://www.w3schools.com/tags/tag_form.asp

查看更多
登录 后发表回答