Convert ASCII and UTF-8 to non-special characters

2020-06-20 08:05发布

问题:

So I'm building a website that is using a database feed that was already set up and has been used by the client for all their other websites for quite some time.

They fill this database through an external program, and I have no way to change the way I get my data.

Now I have the following problem, sometimes I get strings in UTF-8 and sometimes in ASCII (I hope I've got these terms right, they're still a bit vague to me sometimes).

So I could get either this: Scénic or Scénic.

Now the problem is, I have to convert this to non-special characters (so it would become Scenic) for urls.

I don't think there's a function for converting é to e (if there is do tell) so I'll probably need to create an array for that containing all the source and destinations, but the bigger problem is converting é to é without breaking é when it comes through that function.

Or should I just create an array containing everything
(so for example: array('é'=>'e','é'=>'e'); etc.

I know how to get é to é, by doing utf8_encode(html_entity_decode('é')), however putting é through this same function will return é.

Maybe I'm approaching this the wrong way, but in that case I'd love to know how I should approach it.

回答1:

Thanks to @XzKto and this comment on PHP.net I changed my slug function to the following:

static function slug($input){

    $string = html_entity_decode($input,ENT_COMPAT,"UTF-8");

    $oldLocale = setlocale(LC_CTYPE, '0');  

    setlocale(LC_CTYPE, 'en_US.UTF-8');
    $string = iconv("UTF-8","ASCII//TRANSLIT",$string);

    setlocale(LC_CTYPE, $oldLocale);

    return strtolower(preg_replace('/[^a-zA-Z0-9]+/','-',$string));

}

I feel like the setlocale part is a bit dirty but this works perfectly for translating special characters to their 'normal' equivalents.

Input a áñö ïß éèé returns a-ano-iss-eee