strange character encoding of stored data , old sc

2019-01-01 07:42发布

I'm trying to rewrite an old website .
it's in persian which uses perso/arabic characters .

CREATE DATABASE `db` DEFAULT CHARACTER SET utf8 COLLATE utf8_persian_ci;
USE `db`;

Almost all my table/columns COLLATE are set to utf8_persian_ci

I'm using codeigniter for my new script and i have

'char_set' => 'utf8',
'dbcollat' => 'utf8_persian_ci',

In the database settings , so there is no problem there .

So here is the strange part

The old script is using some sort of database engine called TUBADBENGINE or TUBA DB ENGINE ... nothing special .

When i enter some data in the database (in persian) using the old script , when i look into database , characters are stored like عمران .

The old script fetch/shows that data fine , but the new script shows them with the same weird font/charset as database

So when i enter اااا , database stored data looks like عمرا٠, when i fetch it in the new script i see عمرا٠but in the old script i see اااا

CREATE TABLE IF NOT EXISTS `tnewsgroups` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `fName` varchar(200) COLLATE utf8_persian_ci DEFAULT NULL,
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_persian_ci AUTO_INCREMENT=11 ;

--
-- Dumping data for table `tnewsgroups`
--

INSERT INTO `tnewsgroups` (`ID`, `fName`) VALUES
(1, 'عمران'),
(2, 'معماری'),
(3, 'برق'),
(4, 'مکانیک'),
(5, 'test'),
(6, 'test2');

In the other hand when i enter ااااا directly in the database

Of course i have the same اااا stored in the database

The new script is showing it fine

But in the old script i get ????

Can anyone make any sense of this ?

Here is the tuba engin

https://github.com/maxxxir/mz-codeigniter-crud/blob/master/tuba.php

Usage example from old script :

define("database_type" , "MYSQL");
define("database_ip" , "localhost");
define("database_un" , "root");
define("database_pw" , "");
define("database_name" , "nezam2");
define("database_connectionstring" , "");
$db = new TUBADBENGINE(database_type , database_ip , database_un , database_pw , database_name , database_connectionstring);
$db->Select("SELECT * FROM tnews limit 3");
if ($db->Lasterror() != "") { echo "<B><Font color=red>ÎØÇ ! áØÝÇ ãÌÏøÏÇ ÊáÇÔ ˜äíÏ";  exit(); }
for ($i = 0 ; $i < $db->Count() ; $i++) {
    $row = $db->Next();
    var_dump($row);
}

1条回答
不再属于我。
2楼-- · 2019-01-01 08:16

In short, because this has been discussed a thousand times before:

  1. PHP holds a string, say "漢字", encoded in UTF-8. The bytes for this are E6 BC A2 E5 AD 97.
  2. It sends this string over a database connection which is set to latin1.
  3. The database receives the bytes E6 BC A2 E5 AD 97, thinking those represent latin1 characters.
  4. The database stores the characters æ¼¢å­ (the characters that E6 BC A2 E5 AD 97 maps to in latin1).
  5. The same process reversed makes PHP receive the same bytes, which it then treats as UTF-8. The roundtrip works fine for PHP, even though the database doesn't treat the characters as it should.

So the problem here was that the database connection was set incorrectly when the data was entered into the database. You'll have to convert the data in the database to the correct characters. Try this:

SELECT CONVERT(BINARY CONVERT(field_name USING latin1) USING utf8) FROM table_name

Maybe utf8 isn't what you need here, experiment. If that works, change this into an UPDATE statement to update the data permanently.

查看更多
登录 后发表回答