Determine if UTF-8 text is all ASCII?

2020-02-12 07:36发布

What's the fastest way, in PHP, to determine if some given UTF-8 text is purely ASCII or not?

3条回答
闹够了就滚
2楼-- · 2020-02-12 08:08
function isAscii($str) {
    return preg_match('/^([\x00-\x7F])*$/', $str);
}

// doesn't accept ASCII control characters
function isAsciiText($str) {
    return preg_match('/^([\x09\x0A\x0D\x20-\x7E])*$/', $str);
}
查看更多
疯言疯语
3楼-- · 2020-02-12 08:24

A possibly faster function would be to use a negative character class (since the regex can just stop when it hits the first character, and there's no need to internally capture anything):

function isAscii($str) {
    return 0 == preg_match('/[^\x00-\x7F]/', $str);
}

Without regex (based on my comment) {

function isAscii($str) {
    $len = strlen($str) {
    for ($i = 0; $i < $len; $i++) {
        if (ord($str[$i]) > 127) return false;
    }
    return true;
}

But I'd have to ask, why are you so concerned about faster? Use the more readable and easier to understand version, and only worry about optimizing it when you know it's a problem...

Edit:

Then the fastest will likely be mb_check_encoding:

function isAscii($str) {
    return mb_check_encoding($str, 'ASCII');
}
查看更多
别忘想泡老子
4楼-- · 2020-02-12 08:25

Check if any byte is greater than 0x7f, or any character is above U+007F.

查看更多
登录 后发表回答