DB2 UTF-8 encoding: Umlaut to CHAR(1)?

2019-09-12 16:18发布

问题:

What does "CHAR(1)" in a UTF-8 encoded DB2 database mean?

Can I insert a special character (e.g. one that takes 2 octets in UTF-8) into a column of CHAR(1)?

Or does CHAR(1) in UTF-8 always mean, that it has capacity for one byte / octet, i.e. such that inserting an Umlaut into it will fail ?

I read through this interesting developerWorks article, but it's going to deep for my simple question...

回答1:

It depends. :)

DB2 introduced code units to help with designing string-typed columns that are based on number of characters and not number of bytes. The CREATE TABLE statement has an overview of data types and also explains CHAR and VARCHAR. If number of characters is used, DB2 is assuming the worst case - 4 bytes/octets per character - for length computations.

The database configuration string_units determines whether by default the number of characters (CODEUNITS32) or the number of bytes (SYSTEM) is considered.

Coming back to your question: If you did not specify anything, inserting a special character that needs 2 octets into a CHAR(1) will likely fail. If CODEUNITS32 was specified, then it will succeed.