PHP - detecting the user supplied character's

2019-09-02 05:12发布

问题:

Is it possible to detect the user's string's char set?

If not, how about the next question..

Are there reliable built-in PHP functions that can accurately tell if the user supplied string ( be it supplied thru get/post/cookie etc), are in a UTF-8 or not? In other words, can I do something like

is_utf8($_GET['first_name'])

Is there anyway this function could produce a TRUE where in reality the first_name was not in UTF-8?

回答1:

Regarding 1:

You can give mb_detect_encoding a try, but it's pretty much a shot in the dark. An "encoded" string is just a bunch of bytes. Such byte sequences are often equally valid in any number of different encodings. It's therefore by definition not possible to detect an unknown encoding reliably, you can only guess. For this reason there exist meta information such as HTTP headers which should communicate the encoding of the transferred content. Check those if available.

Regarding 2:

mb_check_encoding($var, 'UTF-8') will tell you whether the string is a valid UTF-8 string. As far as I've seen, in recent versions of PHP it does what it says on the tin. That still doesn't mean the string is necessarily really a UTF-8 string, it just means the byte sequence is in an order that is valid in UTF-8.