Why did this str_ireplace() work on a non ASCII st

Note: What I think I know is probably wrong, so please kindly fix my knowledge :)

I just answered a question about UTF-8 and PHP.

I suggested using str_ireplace('Волгоград', '', $a).

I didn't expect this to work, but it did.

I always thought PHP treated one byte as one character, hence why you need to use mb_* functions to get accurate results when using characters outside of ASCII range.

I assumed the Russian characters would take > 1 byte each.

I thought str_replace() would work because the bytes could be matched regardless of whether they are multibyte or not, as long as they are in order.

I thought str_ireplace() would not work because PHP wouldn't know how to map the non ASCII characters to their alternate case equivalent. But, it did work.

Where and how am I wrong? Give me as much information as you can :)

标签： php utf-8 character-encoding

3条回答

Deceive 欺骗

2楼-- · 2019-06-17 02:51

Its the other way round: PHP does not treat every character as a byte, but it treats every byte as a character. So multiple characters are seen as multiple characters (and propably not that one you expect).

0人赞添加讨论(0) 举报

冷血范

3楼-- · 2019-06-17 02:54

Another possible explanation. The Unicode planes have similar attributes as the ISO-8859-1 range.

Converting an uppercase letter into lowercase just requires adding 0x20 for the ASCII range:

0x41   A
0x61   a

And -I did not bother to look it up- I think it's the same for the Latin-1 range in 0xC0-0xDF. And this coincidentally might work for the Russian letters in the Unicode range too:

d092d09ed09bd093d09ed093d0a0d090d094   ВОЛГОГРАД
d0b2d0bed0bbd0b3d0bed0b3d180d0b0d0b4   волгоград

The difference is just that 0x20 has been added on the bytes which were assumed to be L1 characters. So it's probably really just a locale setting.

0人赞添加讨论(0) 举报

ゆ、 Hurt°

4楼-- · 2019-06-17 03:04

It works by making the text lower case by passing it to the libc functions which are dependent on the locale settings; appropriate settings means that the text will lower case properly if the correct charset is used for the bytes.

0人赞添加讨论(0) 举报

Why did this str_ireplace() work on a non ASCII st

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间