String Comparison using PHP mysql_* SET NAMES UTF

2019-07-22 20:17发布

问题:

I have a Mysql table with column State - the states are from across Europe - and the table and columns are in utf8_unicode_ci.

When I call the database I use

mysql_select_db($database_WTF, $WTF);
mysql_query('SET NAMES utf8');
$query_Recordset1 = "SELECT * 
     FROM newmeets
     WHERE newmeets.`State` IS NOT NULL 
     AND newmeets.`State` !=  ''
     ORDER BY newmeets.`State` ASC ";

I then run it though this simple loop

mysql_select_db($database_WTF, $WTF);
mysql_query('SET NAMES utf8');
$query_Recordset1 = "SELECT * 
     FROM newmeets
     WHERE newmeets.`State` IS NOT NULL 
     AND newmeets.`State` !=  ''
     ORDER BY newmeets.`State` ASC ";

$LastState = "";
do {
    echo
    var_dump($LastState == $row_Recordset1['State']);
    echo $row_Recordset1['State'];
    $LastState = $row_Recordset1['State'];
    var_dump($LastState == $row_Recordset1['State']);
    echo "<hr>";
} while ($row_Recordset1 = mysql_fetch_assoc($Recordset1));

The first time when it hits a new value as expected I get:

boolean false
BRUSSELS
boolean true

The next time I get:

boolean true
BRUSSELS
boolean true

Much as I would expect.

The problem comes when it hits non latin characters when each pass produces:

boolean false
Baden-Württember
boolean true

When I set them to be equal they are, and then when the next record is pulled (and each record has been entered identically) it fails, this is when I have used the same encoding, I actually need it to recognise they are the same even if the characters are entered differently.

Regardless of what is happening I have tried utf8_encode and Normalizer::normalize to get a true result from the comparison so I can use this in a control structure (e.g. if you've seen it last time don't print it this time) but it fails.

回答1:

It looks like you have a normalization problem in your database. Instead of storing the same state as string over and over again, put all state names into a table of it's own and reference them.

This will also ensure that you do not - e.g. by accident - put binary different but equally looking data into different rows you're not able to properly align later on as you just did.

Alternatively you should query distinct rows and update them, so that you do at least have the same binary string data for same-named states. E.g. if Mysql is able to actually align these state strings but PHP - due to it's binary nature of strings - is not.