PHP: mb_strtoupper not working

2019-04-28 15:12发布

问题:

I have a problem with UTF-8 and mb_strtoupper.

mb_internal_encoding('UTF-8');
$guesstitlestring='Le Courrier de Sáint-Hyácinthe';

$encoding=mb_detect_encoding($guesstitlestring);
if ($encoding!=='UTF-8') $guesstitlestring=mb_convert_encoding($guesstitlestring,'UTF-8',$encoding);

echo "DEBUG1 $guesstitlestring\n";
$guesstitlestring=mb_strtoupper($guesstitlestring);
echo "DEBUG2 $guesstitlestring\n";

Result:

DEBUG1 Le Courrier de Sáint-Hyácinthe
DEBUG2 LE COURRIER DE S?INT-HY?CINTHE

I don't understand why this is happening? I'm trying to be as careful as I can with the encoding. The string is given first as a UTF-8, verified and possible reconverted to UTF-8. It's a nightmare!

UPDATE

So I've figured out that this was caused by a combination of my entering the arguments via the console and the arguments coming back out of the console. So they were garbled both on the way in and the way out. The solution is to not enter any of the arguments in this way, or get the arguments out in this way.

Thank you everyone for your help in resolving this issue!

回答1:

Instead of strtoupper()/mb_strtoupper() use mb_convert_case() since upper case converting is very tricky across different encodings, also make sure your string IS UTF-8.

$content = 'Le Courrier de Sáint-Hyácinthe';

mb_internal_encoding('UTF-8');
if(!mb_check_encoding($content, 'UTF-8')
    OR !($content === mb_convert_encoding(mb_convert_encoding($content, 'UTF-32', 'UTF-8' ), 'UTF-8', 'UTF-32'))) {

    $content = mb_convert_encoding($content, 'UTF-8'); 
}

// LE COURRIER DE SÁINT-HYÁCINTHE
echo mb_convert_case($content, MB_CASE_UPPER, "UTF-8"); 

Working example: http://3v4l.org/enEfm#v443

See also my comment at the PHP website about the converter: http://www.php.net/manual/function.utf8-encode.php#102382



回答2:

It works for me, but only when the php file itself is saved as UTF-8 and when the terminal that I'm in expects UTF-8. I think what is happening for you is that the file is saved as ISO-8859-1 and your terminal is expecting ISO-8859-1.

First, mb_detect_encoding doesn't actually work for this string. Even when the PHP file is not UTF-8, it still reports it as UTF-8.

When you print the lower case string, it prints ISO-8859-1 characters and your terminal displays them just fine. Then when you convert to upper case using UTF-8, it gets mangled.

I created two versions of this file. I saved it using my text editor in ISO-8859-1 as iso-8859-1.php. Then I used iconv to convert the entire file to UTF-8 and saved it as utf-8.php

iconv iso-8859-1.php --from iso-8859-1 --to UTF-8 > utf-8.php

I added a line to print the result the encoding that mb_detect_encoding returns.

$ file iso-8859-1.php 
iso-8859-1.php: PHP script, ISO-8859 text

$ php iso-8859-1.php 
ENCODING: UTF-8
DEBUG1 Le Courrier de S�int-Hy�cinthe
DEBUG2 LE COURRIER DE S?INT-HY?CINTHE

$ file utf-8.php 
utf-8.php: PHP script, UTF-8 Unicode text

$ php utf-8.php 
ENCODING: UTF-8
DEBUG1 Le Courrier de Sáint-Hyácinthe
DEBUG2 LE COURRIER DE SÁINT-HYÁCINTHE

My terminal actually expects UTF-8 text, so when I print out ISO-8859-1 text it gets mangled. Everything works correctly when the file is saved as utf-8 and the terminal expects utf-8.



回答3:

Actually, what works here is simply

<?php
mb_internal_encoding('UTF-8');

$x='Le Courrier de Sáint-Hyácinthe';
echo mb_strtoupper( $x ) . "\n";

outputs

LE COURRIER DE SÁINT-HYÁCINTHE

here it works directly, but maybe in your case you have to add utf8_encode:

$x = utf8_encode( 'Le Courrier de Sáint-Hyácinthe' );

--

An alternative that works here without MB,

<?php
echo strtoupper(str_replace('á', 'Á', 'Le Courrier de Sáint-Hyácinthe'));