How to fix “Incorrect string value” errors?

2019-01-01 05:12发布

After noticing an application tended to discard random emails due to incorrect string value errors, I went though and switched many text columns to use the utf8 column charset and the default column collate (utf8_general_ci) so that it would accept them. This fixed most of the errors, and made the application stop getting sql errors when it hit non-latin emails, too.

Despite this, some of the emails are still causing the program to hit incorrect string value errrors: (Incorrect string value: '\xE4\xC5\xCC\xC9\xD3\xD8...' for column 'contents' at row 1)

The contents column is a MEDIUMTEXT datatybe which uses the utf8 column charset and the utf8_general_ci column collate. There are no flags that I can toggle in this column.

Keeping in mind that I don't want to touch or even look at the application source code unless absolutely necessary:

  • What is causing that error? (yes, I know the emails are full of random garbage, but I thought utf8 would be pretty permissive)
  • How can I fix it?
  • What are the likely effects of such a fix?

One thing I considered was switching to a utf8 varchar([some large number]) with the binary flag turned on, but I'm rather unfamiliar with MySQL, and have no idea if such a fix makes sense.

标签: mysql
19条回答
余生无你
2楼-- · 2019-01-01 05:41

There's good answers in here. I'm just adding mine since I ran into the same error but it turned out to be a completely different problem. (Maybe on the surface the same, but a different root cause.)

For me the error happened for the following field:

@Column(nullable = false, columnDefinition = "VARCHAR(255)")
private URI consulUri;

This ends up being stored in the database as a binary serialization of the URI class. This didn't raise any flags with unit testing (using H2) or CI/integration testing (using MariaDB4j), it blew up in our production-like setup. (Though, once the problem was understood, it was easy enough to see the wrong value in the MariaDB4j instance; it just didn't blow up the test.) The solution was to build a custom type mapper:

package redacted;

import javax.persistence.AttributeConverter;
import java.net.URI;
import java.net.URISyntaxException;

import static java.lang.String.format;

public class UriConverter implements AttributeConverter<URI, String> {
    @Override
    public String convertToDatabaseColumn(URI attribute) {
        return attribute.toString();
    }

    @Override
    public URI convertToEntityAttribute(String field) {
        try {
            return new URI(field);
        }
        catch (URISyntaxException e) {
            throw new RuntimeException(format("could not convert database field to URI: %s", field));
        }
    }
}

Used as follows:

@Column(nullable = false, columnDefinition = "VARCHAR(255)")
@Convert(converter = UriConverter.class)
private URI consulUri;

As far as Hibernate is involved, it seems it has a bunch of provided type mappers, including for java.net.URL, but not for java.net.URI (which is what we needed here).

查看更多
与君花间醉酒
3楼-- · 2019-01-01 05:42

To fix this error I upgraded my MySQL database to utf8mb4 which supports the full Unicode character set by following this detailed tutorial. I suggest going through it carefully, because there are quite a few gotchas (e.g. the index keys can become too large due to the new encodings after which you have to modify field types).

查看更多
牵手、夕阳
4楼-- · 2019-01-01 05:44

"\xE4\xC5\xCC\xC9\xD3\xD8" isn't valid UTF-8. Tested using Python:

>>> "\xE4\xC5\xCC\xC9\xD3\xD8".decode("utf-8")
...
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid data

If you're looking for a way to avoid decoding errors within the database, the cp1252 encoding (aka "Windows-1252" aka "Windows Western European") is the most permissive encoding there is - every byte value is a valid code point.

Of course it's not going to understand genuine UTF-8 any more, nor any other non-cp1252 encoding, but it sounds like you're not too concerned about that?

查看更多
高级女魔头
5楼-- · 2019-01-01 05:44

Although your collation is set to utf8_general_ci, I suspect that the character encoding of the database, table or even column may be different.

ALTER TABLE tabale_name MODIFY COLUMN column_name VARCHAR(255)  
CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
查看更多
倾城一夜雪
6楼-- · 2019-01-01 05:47

That error means that either you have the string with incorrect encoding (e.g. you're trying to enter ISO-8859-1 encoded string into UTF-8 encoded column), or the column does not support the data you're trying to enter.

In practice, the latter problem is caused by MySQL UTF-8 implementation that only supports UNICODE characters that need 1-3 bytes when represented in UTF-8. See "Incorrect string value" when trying to insert UTF-8 into MySQL via JDBC? for details.

查看更多
呛了眼睛熬了心
7楼-- · 2019-01-01 05:47

1 - You have to declare in your connection the propertie of enconding UTF8. http://php.net/manual/en/mysqli.set-charset.php.

2 - If you are using mysql commando line to execute a script, you have to use the flag, like: Cmd: C:\wamp64\bin\mysql\mysql5.7.14\bin\mysql.exe -h localhost -u root -P 3306 --default-character-set=utf8 omega_empresa_parametros_336 < C:\wamp64\www\PontoEletronico\PE10002Corporacao\BancoDeDadosModelo\omega_empresa_parametros.sql

查看更多
登录 后发表回答