Parsing XML with Unicode characters in Coldfusion

2019-07-03 12:59发布

I'm connecting to an external API using cfhttp, with the returned data in XML format. I have no control over the API or the format it's returned in.

When the data is returned, I loop through it and do cfquery inserts into my own MySQL database, which has a UTF8 charset.

However, some of the data appears to have unicode characters (it appears it should be the £ (pound) sign, but when I cfdump the XMLParsed data, it's showing as a diamond with a ? inside). I've attached a cropped screenshot showing part of the cfdump showing this;

enter image description here

The problem is the cfquery insert - when it gets to those characters, it's returning this error;

Error Executing Database Query.

Incorrect string value: '\xEF\xBF\xBD10 ...' for column 'voucherTitle' at row 1

I've tried setting the charset in the cfhttp call, but get the same result.

Is there any way I can either encode/decode these, or alternatively trim them out altogether (the data gets edited further down the line anyway, so manually adding the correct symbols isn't a huge issue).

1条回答
女痞
2楼-- · 2019-07-03 13:25

UPDATE: As of MySQL 5.5.3, there is also UTF8mb4 which is often recommended over UTF8.


(From the comments)

I recall something similar on another thread. Double check the collation and character set for that column using the INFORMATION_SCHEMA.COLUMNS view:

 SELECT  *
 FROM    INFORMATION_SCHEMA.COLUMNS
 WHERE   TABLE_NAME = 'YourTableName'

If it is not UTF-8, you can change it using the ALTER TABLE command. Modify the column size M as needed.

 ALTER TABLE YourTableName 
    MODIFY YourColumnName VARCHAR(M) 
    CHARACTER SET utf8;

NB: If the data is important, always make a backup of the table before applying any modifications.

See also: 11.1.15 Character Sets and Collations Supported by MySQL

查看更多
登录 后发表回答