After noticing an application tended to discard random emails due to incorrect string value errors, I went though and switched many text columns to use the utf8
column charset and the default column collate (utf8_general_ci
) so that it would accept them. This fixed most of the errors, and made the application stop getting sql errors when it hit non-latin emails, too.
Despite this, some of the emails are still causing the program to hit incorrect string value errrors: (Incorrect string value: '\xE4\xC5\xCC\xC9\xD3\xD8...' for column 'contents' at row 1)
The contents column is a MEDIUMTEXT
datatybe which uses the utf8
column charset and the utf8_general_ci
column collate. There are no flags that I can toggle in this column.
Keeping in mind that I don't want to touch or even look at the application source code unless absolutely necessary:
- What is causing that error? (yes, I know the emails are full of random garbage, but I thought utf8 would be pretty permissive)
- How can I fix it?
- What are the likely effects of such a fix?
One thing I considered was switching to a utf8 varchar([some large number]) with the binary flag turned on, but I'm rather unfamiliar with MySQL, and have no idea if such a fix makes sense.
I would not suggest Richies answer, because you are screwing up the data inside the database. You would not fix your problem but try to "hide" it and not being able to perform essential database operations with the crapped data.
If you encounter this error either the data you are sending is not UTF-8 encoded, or your connection is not UTF-8. First, verify, that the data source (a file, ...) really is UTF-8.
Then, check your database connection, you should do this after connecting:
Next, verify that the tables where the data is stored have the utf8 character set:
Last, check your database settings:
If source, transport and destination are UTF-8, your problem is gone;)
What I did ,was firstly changed the column type to LONG BLOB ,inserted data and then changed the column type to VARCHAR(255) as the data was not that sensitive ,I took the risk and it was huge too( Around 40k entries).I suggest you can try this if only you don't have any data which you don't want to distort.
I have tried all of the above solutions (which all bring valid points), but nothing was working for me.
Until I found that my MySQL table field mappings in C# was using an incorrect type: MySqlDbType.Blob . I changed it to MySqlDbType.Text and now I can write all the UTF8 symbols I want!
p.s. My MySQL table field is of the "LongText" type. However, when I autogenerated the field mappings using MyGeneration software, it automatically set the field type as MySqlDbType.Blob in C#.
Interestingly, I have been using the MySqlDbType.Blob type with UTF8 characters for many months with no trouble, until one day I tried writing a string with some specific characters in it.
Hope this helps someone who is struggling to find a reason for the error.
I got a similar error (
Incorrect string value: '\xD0\xBE\xDO\xB2. ...' for 'content' at row 1
). I have tried to change character set of column toutf8mb4
and after that the error has changed to'Data too long for column 'content' at row 1'
.It turned out that mysql shows me wrong error. I turned back character set of column to
utf8
and changed type of the column toMEDIUMTEXT
. After that the error disappeared.I hope it helps someone.
By the way MariaDB in same case (I have tested the same INSERT there) just cut a text without error.
I solved this problem today by altering the column to 'LONGBLOB' type which stores raw bytes instead of UTF-8 characters.
The only disadvantage of doing this is that you have to take care of the encoding yourself. If one client of your application uses UTF-8 encoding and another uses CP1252, you may have your emails sent with incorrect characters. To avoid this, always use the same encoding (e.g. UTF-8) across all your applications.
Refer to this page http://dev.mysql.com/doc/refman/5.0/en/blob.html for more details of the differences between TEXT/LONGTEXT and BLOB/LONGBLOB. There are also many other arguments on the web discussing these two.
MySQL’s utf-8 types are not actually proper utf-8 – it only uses up to three bytes per character and supports only the Basic Multilingual Plane (i.e. no Emoji, no astral plane, etc.).
If you need to store values from higher Unicode planes, you need the utf8mb4 encodings.