What does "CHAR(1)" in a UTF-8 encoded DB2 database mean?
Can I insert a special character (e.g. one that takes 2 octets in UTF-8) into a column of CHAR(1)?
Or does CHAR(1) in UTF-8 always mean, that it has capacity for one byte / octet, i.e. such that inserting an Umlaut into it will fail ?
I read through this interesting developerWorks article, but it's going to deep for my simple question...
It depends. :)
DB2 introduced code units
to help with designing string-typed columns that are based on number of characters and not number of bytes. The CREATE TABLE
statement has an overview of data types and also explains CHAR
and VARCHAR
. If number of characters is used, DB2 is assuming the worst case - 4 bytes/octets per character - for length computations.
The database configuration string_units
determines whether by default the number of characters (CODEUNITS32
) or the number of bytes (SYSTEM
) is considered.
Coming back to your question: If you did not specify anything, inserting a special character that needs 2 octets into a CHAR(1) will likely fail. If CODEUNITS32 was specified, then it will succeed.