How do you know what encoding the user is inputing

I read Joel's article about character sets and so I'm taking his advice to use UTF-8 on my web page and in my database. What I can't understand is what to do with user input. As Joel says, "It does not make sense to have a string without knowing what encoding it uses." But how do I know what encoding the user input string uses? If I have

<input type="text" name="atextfield" >

on my page, how do I know what encoding I'm getting from the user? What if the user puts in some special ASCII symbol, like ♣ or ™ or something? Is there some way I can detect that user input gave me something unrecognized in UTF-8? Is there some standard for how to handle this sort of thing?

标签： php html encoding utf-8

3条回答

做自己的国王

2楼-- · 2019-03-06 15:45

If your web-page using UTF-8, browser will convert to UTF-8 for you. So, even the special characters are in ASCII it will submit as UTF-8.

However, you never know itchy hand from an user that switch back the page encoding to ISO-8859-*.

You can make use on mb_detect_encoding, but is not 100% bullet-proof.

/* Detect character encoding with current detect_order */
echo mb_detect_encoding($str);

/* "auto" is expanded to "ASCII,JIS,UTF-8,EUC-JP,SJIS" */
echo mb_detect_encoding($str, "auto");

/* Specify encoding_list character encoding by comma separated list */
echo mb_detect_encoding($str, "JIS, eucjp-win, sjis-win");

/* Use array to specify encoding_list  */
$ary[] = "ASCII";
$ary[] = "JIS";
$ary[] = "EUC-JP";
echo mb_detect_encoding($str, $ary);

0人赞添加讨论(0) 举报

仙女界的扛把子

3楼-- · 2019-03-06 15:51

Check the HTTP headers to discover the character encoding.

0人赞添加讨论(0) 举报

一纸荒年 Trace。

4楼-- · 2019-03-06 16:04

Don't try to detect, convert all user-inputed text to UTF-8 in your application. You can do all you can on your side, by configuring your webserver to send UTF-8 pages and UTF-8 headers, configure your application to handle all text in UTF-8, tweak your filesystem (if necessary) to handle text files as UTF-8, configure your database, but you simply have no real control on the user end. You can suggest the proper character encoding in your html forms, like the following, but it's not really enforceable on the user end:

<form action="/index.php" method="post" accept-charset="UTF-8"></form>

Unless detecting the encoding of the user input is the whole purpose of your application, it's a fools errand to try. Assume the encoding is wrong and convert it to UTF-8 in your app. Just as you should assume your user input is malicious and clean it up before you attempt to insert it into your database.

In most languages that have UTF-8 properly implemented, ASCII characters will survive conversion, so don't worry about that either.

0人赞添加讨论(0) 举报

How do you know what encoding the user is inputing

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间