Data too long for column error with national chara

2019-05-22 05:41发布

问题:

I have to port some DBS into stand alone MySQL Version: 5.0.18 running on Windows7 64bit and I got a problem I am stuck with. If I try to insert any national/unicode character into varchar I got error:

ERROR 1406 (22001): Data too long for column 'nam' at row 1

Here MCVE SQL script:

SET NAMES utf8;

DROP TABLE IF EXISTS `tab`;
CREATE TABLE `tab` (`ix` INT default 0,`nam` VARCHAR(1024) default '' ) DEFAULT CHARSET=utf8;
INSERT INTO `tab` VALUES (1,'motorček');
INSERT INTO `tab` VALUES (2,'motorcek');
SELECT * FROM `tab`;

And here output:

mysql> SET NAMES utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> DROP TABLE IF EXISTS `tab`;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE `tab` (`ix` INT default 0,`nam` VARCHAR(1024) default '' ) DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO `tab` VALUES (1,'motorček');
ERROR 1406 (22001): Data too long for column 'nam' at row 1
mysql> INSERT INTO `tab` VALUES (2,'motorcek');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM `tab`;
+------+----------+
| ix   | nam      |
+------+----------+
|    2 | motorcek |
+------+----------+
1 row in set (0.00 sec)

As you can see the entry with national character č E8h is missing.

I am aware of these QAs:

  • How to make MySQL handle UTF-8 properly
  • “Data too long for column” - why?
  • Error Code: 1406. Data too long for column - MySQL

but they do not address this problem (no solution from any of those work for this).

This problem is present even for single character strings. No matter the size of VARCHAR. So the only solution for now is change the national characters into ASCII but that would lose information which I would rather avoid.

I tried using various character sets utf8, ucs2, latin1 without any effect.

I tried drop the STRICT_TRANS_TABLES as some of the other answers suggest but that has no effect with this either (and the string size is many times bigger than needed).

Does anyone have any clues? May be it has something to do with fact that this MySQL server is standalone (it is not installed) it is started with this cmd:

@echo off

bin\mysqld --defaults-file=bin\my.ini --standalone --console --wait_timeout=2147483 --interactive_timeout=2147483

if errorlevel 1 goto error
goto finish

:error
echo.
echo MySQL could not be started
pause

:finish

and queries are done inside console started like this cmd:

@echo off
    bin\mysql.exe -uroot -h127.0.0.1 -P3306
rem bin\mysql.exe -uroot -proot -h127.0.0.1 -P3306

回答1:

Well looking at the char č code E8h (while writing question) It does not look like UTF8 but rather a extended ASCII (code above 7Fh) which finally pointed me to try this MySQL script:

SET NAMES latin1;
DROP TABLE IF EXISTS `tab`;
CREATE TABLE `tab` (`ix` INT default 0,`nam` VARCHAR(1024) default '' );
INSERT INTO `tab` VALUES (1,'motorček');
INSERT INTO `tab` VALUES (2,'motorcek');
SELECT * FROM `tab`;

Which finally works (silly me I thought I already tried it before without correct result). So my error was to force Unicode (which was set as default) for Non Unicode strings (which I think should work). Here the result:

mysql> SET NAMES latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> DROP TABLE IF EXISTS `tab`;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE `tab` (`ix` INT default 0,`nam` VARCHAR(1024) default '' );
Query OK, 0 rows affected (0.02 sec)

mysql> INSERT INTO `tab` VALUES (1,'motorček');
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO `tab` VALUES (2,'motorcek');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM `tab`;
+------+----------+
| ix   | nam      |
+------+----------+
|    1 | motorček  |
|    2 | motorcek |
+------+----------+
2 rows in set (0.00 sec)

But as you can see there is some discrepancy in the table formatting but that does not matter much as the presentation will be done in C++ anyway.

Without writing this Question I would probably going in circles for hours or even days. Hopefully this helps others too.

[Edit1]

Now I got another problem caused by Windows. If I pass the script with Clipboard or type it myself all is OK but if I use source file then the national characters will go wrong (and the -e option does not help either). As I need to use files I am still looking for solution. But as this is different problem I decided to Ask new question:

  • Using source command corrupts non Unicode text encoding