Encoding SQL_Latin1_General_CP1_CI_AS into UTF-8

2019-01-18 05:37发布

I'm generating a XML file with PHP using DomDocument and I need to handle asian characters. I'm pulling data from the MSSQL2008 server using the pdo_mssql driver and I apply utf8_encode() on the XML attribute values. Everything works fine as long as there's no special characters.

The server is MS SQL Server 2008 SP3

The database, table and column collation are all SQL_Latin1_General_CP1_CI_AS

I'm using PHP 5.2.17

Here's my PDO object:

$pdo = new PDO("mssql:host=MyServer,1433;dbname=MyDatabase", user123, password123);

My query is a basic SELECT.

I know storing special characters into SQL_Latin1_General_CP1_CI_AS columns isn't great, but ideally it would be nice to make it work without changing it, because other non-PHP programs already use that column and it works fine. In SQL Server Management Studio I can see the asian characters correctly.

Considering all the details above, how should I process the data?

6条回答
叼着烟拽天下
2楼-- · 2019-01-18 05:40

Thanks @SGr for answer.
I found out a better way for doing that :

SELECT CAST(CAST(MY_COLUMN AS VARBINARY(MAX)) AS VARCHAR(MAX)) as MY_COLUMN FROM MY_TABLE;
and also try with:
SELECT CAST(MY_COLUMN AS VARBINARY(MAX)) as MY_COLUMN FROM MY_TABLE;

And in PHP you should just convert it to UTF-8 :

$string = iconv('UCS-2LE', 'UTF-8', $row['MY_COLUMN']);

查看更多
放荡不羁爱自由
3楼-- · 2019-01-18 05:41

You can try so:

header("Content-Type: text/html; charset=utf-8");
$dbhost   = "hostname";
$db       = "database";
$query = "SELECT *
    FROM Estado
    ORDER BY Nome";
$conn = new PDO( "sqlsrv:server=$dbhost ; Database = $db", "", "" );
$stmt = $conn->prepare( $query, array(PDO::ATTR_CURSOR => PDO::CURSOR_SCROLL, PDO::SQLSRV_ATTR_CURSOR_SCROLL_TYPE => PDO::SQLSRV_CURSOR_BUFFERED, PDO::SQLSRV_ENCODING_SYSTEM) );
$stmt->execute();
while ( $row = $stmt->fetch( PDO::FETCH_ASSOC ) )
{
// CP1252 == code page Latin1
print iconv("CP1252", "ISO-8859-1", "$row[Nome] <br>");
}
查看更多
在下西门庆
4楼-- · 2019-01-18 05:57

I found how to solve it, so hopefully this will be helpful to someone.

First, SQL_Latin1_General_CP1_CI_AS is a strange mix of CP-1252 and UTF-8. The basic characters are CP-1252, so this is why all I had to do was UTF-8 and everything worked. The asian and other UTF-8 characters are encoded on 2 bytes and the php pdo_mssql driver seems to hate varying length characters so it seems to do a CAST to varchar (instead of nvarchar) and then all the 2 byte characters become question marks ('?').

I fixed it by casting it to binary and then I rebuild the text with php:

SELECT CAST(MY_COLUMN AS VARBINARY(MAX)) FROM MY_TABLE;

In php:

//Binary to hexadecimal
$hex = bin2hex($bin);

//And then from hex to string
$str = "";
for ($i=0;$i<strlen($hex) -1;$i+=2)
{
    $str .= chr(hexdec($hex[$i].$hex[$i+1]));
}
//And then from UCS-2LE/SQL_Latin1_General_CP1_CI_AS (that's the column format in the DB) to UTF-8
$str = iconv('UCS-2LE', 'UTF-8', $str);
查看更多
够拽才男人
5楼-- · 2019-01-18 06:02

By default, PDO uses PDO::SQLSRV_ENCODING_UTF8 for sending/receiving data.

If your current collate is LATIN1, have you tried specifiying PDO::SQLSRV_ENCODING_SYSTEM to let PDO know that you want to use the current system encoding instead of UTF-8 ?

You could even use PDO::SQLSRV_ENCODING_BINARY which returns data in a binary form (no encoding or translation is done when transfering data). This way, you could handle character encoding on your side.

More documentation here: http://ca3.php.net/manual/en/ref.pdo-sqlsrv.php

查看更多
冷血范
6楼-- · 2019-01-18 06:03

For me, none of the above was the direct solution--though I did use parts of above solutions. This worked for me with the Vietnamese alphabet. If you come across this post and none of the above work for you, try:

    $req = "SELECT CAST(MY_COLUMN as VARBINARY(MAX)) as MY_COLUMN FROM MY_TABLE"; 
    $stmt = $conn->prepare($req);
    $stmt->execute();
    while ($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
        $str = pack("H*",$row['MY_COLUMN']);
        $str = mb_convert_encoding($z, 'HTML-ENTITIES','UCS-2LE');
        print_r($str);
    }

And a little bonus--I had to json_encode this data and was (duh) getting html code instead of the special characters. to fix just use html_entity_decode() on the strings before sending with json_encode.

查看更多
We Are One
7楼-- · 2019-01-18 06:07

I know this post is old, but the only thing that work for me was iconv("CP850", "UTF-8//TRANSLIT", $var); I had the same issues with SQL_Latin1_General_CP1_CI_AI, maybe it work for SQL_Latin1_General_CP1_CI_AS too.

查看更多
登录 后发表回答