Comparing UTF-8 String

2019-03-11 03:14发布

I'm trying to compare two string lets say Émilie and Zoey. Well 'E' comes before 'Z' but on the ASCII chart Z comes before É so a normal if ( str1 > str2 ) Won't work.

I tried with if (strcmp(str1,str2) > 0) still don't work. So i'm looking into a native way to compare string with UTF-8 characters.

3条回答
2楼-- · 2019-03-11 03:51

There is no native way to do this, however a PECL extension: http://php.net/manual/de/class.collator.php

$c = new Collator('fr_FR');
if ($c->compare('Émily', 'Zoey') < 0) { echo 'Émily < Zoey'; }
查看更多
虎瘦雄心在
3楼-- · 2019-03-11 04:01

IMPORTANT

This answer is meant for situations where it's not possible to run/install the 'intl' extension, and only sorts strings by replacing accented characters to non-accented characters. To sort accented characters according to a specific locale, using a Collator is a better approach -- see the other answer to this question for more information.

Sorting by non-accented characters in PHP 5.2

You may try converting both strings to ASCII using iconv() and the //TRANSLIT option to get rid of accented characters;

$str1 = iconv('utf-8', 'ascii//TRANSLIT', $str1);

Then do the comparison

See the documentation here:

http://www.php.net/manual/en/function.iconv.php

[updated, in response to @Esailija's remark] I overlooked the problem of //TRANSLIT translating accented characters in unexpected ways. This problem is mentioned in this question: php iconv translit for removing accents: not working as excepted?

To make the 'iconv()' approach work, I've added a code sample below that strips all non-word characters from the resulting string using preg_replace().

<?php

setLocale(LC_ALL, 'fr_FR');

$names = array(
   'Zoey and another (word) ',
   'Émilie and another word',
   'Amber',
);


$converted = array();

foreach($names as $name) {
    $converted[] = preg_replace('#[^\w\s]+#', '', iconv('UTF-8', 'ASCII//TRANSLIT', $name));
}

sort($converted);

echo '<pre>'; print_r($converted);

// Array
// (
//     [0] => Amber
//     [1] => Emilie and another word
//     [2] => Zoey and another word 
// )
查看更多
Luminary・发光体
4楼-- · 2019-03-11 04:03

Here's something that works for me although I'm not sure if it will serve the purpose of comparing the special characters other languages have.

I'm just using the mb_strpos function and looking at the results. I guess that would be as close as you can get to a native comparing of UTF8 strings:

if (mb_strpos(mb_strtolower($search_in), $search_for) !== false) {
    //do stuff
}
查看更多
登录 后发表回答