Multi-byte safe wordwrap() function for UTF-8

2019-01-09 01:58发布

PHP's wordwrap() function doesn't work correctly for multi-byte strings like UTF-8.

There are a few examples of mb safe functions in the comments, but with some different test data they all seem to have some problems.

The function should take the exact same parameters as wordwrap().

Specifically be sure it works to:

  • cut mid-word if $cut = true, don't cut mid-word otherwise
  • not insert extra spaces in words if $break = ' '
  • also work for $break = "\n"
  • work for ASCII, and all valid UTF-8

8条回答
Deceive 欺骗
2楼-- · 2019-01-09 02:14

Here's my own attempt at a function that passed a few of my own tests, though I can't promise it's 100% perfect, so please post a better one if you see a problem.

/**
 * Multi-byte safe version of wordwrap()
 * Seems to me like wordwrap() is only broken on UTF-8 strings when $cut = true
 * @return string
 */
function wrap($str, $len = 75, $break = " ", $cut = true) { 
    $len = (int) $len;

    if (empty($str))
        return ""; 

    $pattern = "";

    if ($cut)
        $pattern = '/([^'.preg_quote($break).']{'.$len.'})/u'; 
    else
        return wordwrap($str, $len, $break);

    return preg_replace($pattern, "\${1}".$break, $str); 
}
查看更多
小情绪 Triste *
3楼-- · 2019-01-09 02:22

Here is the multibyte wordwrap function i have coded taking inspiration from of others found on the internet.

function mb_wordwrap($long_str, $width = 75, $break = "\n", $cut = false) {
    $long_str = html_entity_decode($long_str, ENT_COMPAT, 'UTF-8');
    $width -= mb_strlen($break);
    if ($cut) {
        $short_str = mb_substr($long_str, 0, $width);
        $short_str = trim($short_str);
    }
    else {
        $short_str = preg_replace('/^(.{1,'.$width.'})(?:\s.*|$)/', '$1', $long_str);
        if (mb_strlen($short_str) > $width) {
            $short_str = mb_substr($short_str, 0, $width);
        }
    }
    if (mb_strlen($long_str) != mb_strlen($short_str)) {
        $short_str .= $break;
    }
    return $short_str;
}

Dont' forget to configure PHP for using UTF-8 with :

ini_set('default_charset', 'UTF-8');
mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');

I hope this will help. Guillaume

查看更多
登录 后发表回答