Need help querying UTF8 strings from Vertica with

2019-08-08 16:24发布

问题:

I've been having some trouble figuring out the best way to handle UTF8 characters in PHP. I'm able to load UTF8 data (chinese characters) into Vertica just fine, and can see them there when using a JDBC client, so I know the data is being recorded correctly.

However, when I query via PHP, strings that contain UTF8 characters come through as nulls. However, I can do something like wrap the UTF8 field in a URI_PERCENT_ENCODE function, then do a urldecode on the data in PHP, which outputs the characters correctly.

Are there any ODBC driver settings, or PHP settings that you can recommend to handle UTF8 more gracefully?

We are running PHP 5.3, 64 bits.

回答1:

For whatever it's worth, when working with the Vertica 64-bit ODBC for Windows and calling SQLDescribeColW to describe a table with Chinese name and Chinese column names (i.e. describing an SQL statement like 'select * from mytable'), the names returned encoded in "funky UTF-8".

The "funky UTF-8" or FUTF-8 encoding uses wchar_t[] (on Windows it is an array of 16-bit values) where in each entry in the array, there is a single real-UTF-8 byte.

For example, if the column name was "时髦" whose UTF-16 encoding is 65f6h,9ae6h (two characters, 16 bits each) and its UTF-8 encoding is e6h, 97h, b6h, e9h, abh, a6h (two characters, 3 bytes each) then in FUTF-8 you'd get: 00e6h, 0097h, 00b6h, 00e9h, 00abh, 00a6h (6 characters, 16 bits each).

I guess that this is what puts in null for PHP. I'd call it a bug of the ODBC driver.