How to emulate MySQLs utf8_general_ci collation in

2019-04-09 01:47发布

Basically, if two strings would evaluate as the same in my database I'd also like to be able to check that at the application level. For example, if somebody enters "bjork" in a search field, I want PHP to be able to match that to the string "Björk" just as MySQL would.

I'm guessing PHP has no direct equivalent to MySQL's collation options, and that the easiest thing to do would be to write a simple function that converts the strings, using strtolower() to make them uniformly lower-case and strstr() to replace multi-byte characters with their corresponding ASCII equivalents.

Is that an accurate assumption? Does anybody have a fool-proof array handy to use as the second parameter of strstr() for conforming strings as various MySQL collations would do (specifically for my current needs, utf8_general_ci)? Or, lacking that, where could I find documentation of exactly how the different collations in MySQL treat various characters? (I saw somewhere that in some collations ß is treated as S and in others as Ss, for instance, but it didn't outline every character evaluation.)

3条回答
Animai°情兽
2楼-- · 2019-04-09 02:17

Here's what I've been using, but I have yet to test it for complete consistency with MySQL.

function collation_conform($string,$collation='utf8_general_ci')
{

    if($collation === 'utf8_general_ci')
    {
        if(!is_string($string))
            return $string;

        $string = strtr($string, array(
            'Š'=>'S', 'š'=>'s', 'Ð'=>'D', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 
            'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 
            'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 
            'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss','à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 
            'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 
            'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u',
            'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'ƒ'=>'f'));

        return strtolower($string);
    }
    else die('Unsupported Collation (collation_conform() collation_helper.php)');
}
查看更多
甜甜的少女心
3楼-- · 2019-04-09 02:31

Have you looked at the PHP collation class? http://www.php.net/manual/en/class.collator.php

查看更多
女痞
4楼-- · 2019-04-09 02:36

Try the following code.

$s1 = 'Björk';
$s2 = 'bjork';

var_dump(
    is_same_string($s1, $s2)
);

function is_same_string($str, $str2, $locale = 'en_US')
{
    $coll = collator_create($locale);
    collator_set_strength($coll, Collator::PRIMARY);  
    return 0 === collator_compare($coll, $str, $str2);
}
查看更多
登录 后发表回答