ISO-8859-1 Character truncates text inserting into

2019-03-06 06:27发布

So I have a weird truncate issue! Can't find a specific answer on this.

So basically there's an issue with an apparent ISO character ½ that truncates the rest of the text upon insertion into a column with UTF-8 specified.

Lets say that my string is: "You need to add ½ cup of water." MySQL will truncate that to "You need to add"

if I:

print iconv("ISO-8859-1", "UTF-8//IGNORE", $text);

Then it outputs:

½

O_o

OK that doesn't work because I need the 1/2 by itself. If I go to phpMyAdmin and copy and paste the sentence in and submit it, it works like a charm as the whole string is in there with half symbol and remaining text! Something is wrong and I'm puzzled at what it is. I know this will probably affect other characters so the underlying problem needs to be addressed.

The language I'm using is php, the file itself is encoded as UTF-8 and the data I'm bringing in has content-type set to ISO-8859-1. The column is utf8_general_ci and all the mysql character sets are set to UTF-8 in php: "SET character_set_result = 'utf8', etc..."

3条回答
三岁会撩人
2楼-- · 2019-03-06 07:19

Check into mb_convert_encoding if you can't change the way the data is handled. Otherwise, do yourself a favor and get your encoding on the same page before it gets out of hand. UTF-8 uses multibyte characters which aren't recognized in the ISO-8859-1 (Latin) encoding. wikipedia. This page and this page are good sources, as well as this debug table.

Finally, I've run into this when various combinations of htmlentities, htmlspecialchars and html_entity_decode are used..

查看更多
霸刀☆藐视天下
3楼-- · 2019-03-06 07:26

Something in your code isn't handling the string as UTF8. It could be your PHP/HTML, it could be in your connection to the DB, or it could be the DB itself - everything has to be set as UTF8 consistently, and if anything isn't, the string will get truncated exactly as you see when passing across a UTF8/non-UTF8 boundary.

I will assume your DB is UTF8 compliant - that is easiest to check. Note that the collation can be set at the server level, database level, the table level, and the column level within the table. Setting UTF8 collation on the column should override anything else for storage, but the others will still kick in when talking to the DB if they're not also UTF8. If you're not sure, explicitly set the connection to UTF8 after you open it:

$dbh->setAttribute(PDO::MYSQL_ATTR_INIT_COMMAND, "SET NAMES 'utf8'");

Now your DB & connection are UTF8, make sure your web page is too. Again, this can be set in more than one place (.htaccess, php.ini). If you're not sure / don't have access, just override whatever PHP is picking up as default at the top of your page:

<?php ini_set('default_charset', 'UTF-8'); ?>

Note that you want the above right at the start, before any text is output from your page. Once text gets output, it is potentially too late to try and specify an encoding - you may already be locked into whatever is default on your server. I also then repeat this in my headers (possibly overkill):

<head>
<meta charset="UTF-8">
<meta http-equiv="Content-type" content="text/html; charset=UTF-8">
</head>

And I override it on forms where I'm taking data as well:

<FORM NAME="utf8-test" METHOD="POST" ACTION="utf8-test.php" enctype="multipart/form-data" accept-charset="UTF-8">"

To be honest, if you've set the encoding at the top, my understanding is that the other overrides aren't required - but I keep them anyway, because it doesn't break anything either, and I'd rather just state the encoding explicitly, than let the server make assumptions.

Finally, you mentioned that in phpMyAdmin you inserted the string and it looked as expected - are you sure though that the phpMyAdmin pages are UTF8? I don't think they are. When I store UTF8 data from my PHP code, it views like raw 8-bit characters in phpMyAdmin. If I take the same string and store it directly in phpMyAdmin, it looks 'correct'. So I'm guessing phpMyAdmin is using the default character set of my local server, not necessarily UTF8.

For example, the following string stored from my web page:

I can¹t wait

Reads like this in my phpMyAdmin:

I can’t wait

So be careful when testing that way, as you don't really know what encoding phpMyAdmin is using for display or DB connection.

If you're still having issues, try my code below. First I create a table to store the text in UTF8:

CREATE TABLE IF NOT EXISTS `utf8_test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `my_text` varchar(8000) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

And here's some PHP to test it. It basically takes your input on a form, echoes that input back at you, and stores/retrieves the text from the DB. Like I said, if you view the data directly in phpMyAdmin, you might find it doesn't look right there, but through the page below it should always appear as expected, due to the page & db connection both being locked to UTF8.

<?php
  // Override whatever is set in php.ini
  ini_set('default_charset', 'UTF-8');

  // The following should not be required with the above override
  //header('Content-Type:text/html; charset=UTF-8');

  // Open the database
  $dbh = new PDO('mysql:dbname=utf8db;host=127.0.0.1;charset=utf8', 'root', 'password');

  // Set the connection to UTF8
  $dbh->setAttribute(PDO::MYSQL_ATTR_INIT_COMMAND, "SET NAMES 'utf8'");
  // Tell MySql to do the parameter replacement, not PDO
  $dbh->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
  // Throw exceptions (and break the code) if a query is bad
  $dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

  $id = 0;
  if (isset($_POST["StoreText"]))
  {
    $stmt = $dbh->prepare('INSERT INTO utf8_test (my_text) VALUES (:my_text)');
    $stmt->execute(array(':my_text' => $_POST['my_text']));
    $id = $dbh->lastInsertId();
  }
?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional/EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="UTF-8">
<meta http-equiv="Content-type" content="text/html; charset=UTF-8">

<title>UTF-8 Test</title>
</head>

<body>

<?php
  // If something was posted, output it
  if (isset($_POST['my_text']))
  {
    echo "POSTED<br>\n";
    echo $_POST['my_text'] . "<br>\n";
  }

  // If something was written to the database, read it back, and output it
  if ($id > 0)
  {
    $stmt = $dbh->prepare('SELECT my_text FROM utf8_test WHERE id = :id');
    $stmt->execute(array(':id' => $id));
    if ($result = $stmt->fetch())
    {
      echo "STORED<br>\n";
      echo $result['my_text'] . "<br>\n";
    }
  }

  // Create a form to take some user input
  echo "<FORM NAME=\"utf8-test\" METHOD=\"POST\" ACTION=\"utf8-test.php\" enctype=\"multipart/form-data\" accept-charset=\"UTF-8\">";

  echo "<br>";

  echo "<textarea name=\"my_text\" rows=\"20\" cols=\"90\">";

  // If something was posted, include it on the form
  if (isset($_POST['my_text']))
  {
    echo $_POST['my_text'];
  }

  echo "</textarea>";

  echo "<br>";
  echo "<INPUT TYPE = \"Submit\" Name = \"StoreText\" VALUE=\"Store It\" />";

  echo "</FORM>";
?>
<br>

</body>

</html>
查看更多
放我归山
4楼-- · 2019-03-06 07:31

Did you call set_charset() on your MySQLi database connection? It's required to properly use real_escape_string().

$db = new mysqli(...);
$db->set_charset('utf8');

Setting session variables in your connection is not enough -- those affect what happens on the server-side. The set_charset will affect what happens client side.

You can checkout the PHP reference mysqli::real_escape_string

查看更多
登录 后发表回答