I have a PHP 5.3 script displaying users of my web site and would like to replace a certain Russian city (stored in UTF8 in PostgreSQL 8.4.7 database + CentOS 5.5/64 bits Linux) by its older name (it is an insider joke):
preg_replace('/Волгоград/iu', 'Сталинград', $city);
Unfortunately this only works for exact matches: Волгоград.
This does not work for other cases, like ВОЛГОГРАД or волгоград.
If I modify my source code to
preg_replace('/[Вв]олгоград/iu', 'Сталинград', $city);
then it will catch the 2nd case above.
Does anybody know what it going on and how to fix it (assuming I don't want to write [Xx] for every letter)?
Thank you! Alex
UPDATE:
# rpm -qa|grep php
php53-bcmath-5.3.3-1.el5
php53-gd-5.3.3-1.el5
php53-common-5.3.3-1.el5
php53-pdo-5.3.3-1.el5
php53-mbstring-5.3.3-1.el5
php53-xml-5.3.3-1.el5
php53-5.3.3-1.el5
php53-cli-5.3.3-1.el5
php53-pgsql-5.3.3-1.el5
# rpm -qa|grep pcre
pcre-6.6-2.el5_1.7
Just guessing, but explicitly encoding the string to unicode may help:
Actually with PHP 5.2.x on windows the selected for a solved answer did not work for me.
I had to go through converting to Windows-1251 to make it work.
Here you go the example:
The example above will substitute successfully (case-insesitively) 'гъз' with YYYYYY and give you back the UTF-8 version.
Regards!
I copy+pasted your big
В
. It is indeedU+D092
, not the normal latinB
. But since they look so much alike:ВB
I believe the russian letter is collated onto the Latin B ofU+0042
.So either it's PHP preformatting it, or maybe PCRE is somewhat inexact there too. Test your
print PCRE_VERSION;
and have a look into the changelog.Anyway, to evade the problem I would suggest you only use the lowercase letters. They are more likely to be distinct from the Latin alphabet.
P.S.: Evil inside joke!
Works like a charm on my box...
Output:
Are you sure that input data ($city) is in UTF8?
You can skip the regex, it worked for me in PHP 5.2.11 :)
Output
This intrigued me, so I asked a question.
I cannot reproduce your issue with a PHP 5.3.3 (
PHP 5.3.3-1ubuntu9.3 with Suhosin-Patch (cli)
):outputs
Which PCRE version is your PHP using? Check you
phpinfo()
for thepcre
-section. That's the one on my system: