Replacing empty space with preg_replace causes inv

2019-09-07 18:08发布

问题:

Our PHP web application (PHP 5.6.30 running on Windows Server 2008 R2) uses UTF-8 encoding but needs to import data from files that are encoded using Windows-1252. When the data is imported it is converted to UTF-8 as follows.

iconv('Windows-1252', 'UTF-8', $value);

When we import the following sample data, the conversion works correctly for most of the Windows-1252 characters, but in line 8 below, the à character gives problems and is not correctly converted.

1;€
2;é
3;è
4;ë
5;ï
6;ä
7;á
8;à
9;ç
10;ß
11;ø 
12;í
13;ì
14;ñ
15;@
16;û

Here is a screenshot showing the result of displaying this data on the website.

Does anyone know why the PHP iconv is not correctly converting the à character?

回答1:

I resolved this issue and it ended up having nothing to do with iconv like I initially thought. The change that was required was such a small one, only one character, but it took me ages to hunt this down. It turns out that the offending statement was actually the following:

preg_replace('/\s+/', ' ',$columnvalue))

The purpose of this regular expression is to remove white space from the value, but because the encoding was UTF-8 this regular expression had a residual effect of corrupting the à character. I resolved this but adding u (unicode modifier) to the end of the regular expression definition. So the expression became:

preg_replace('/\s+/u', ' ',$columnvalue))

And then the encoding of the page was correct.