The C standard states:
ISO/IEC 9899:1999, 6.2.5.15 (p. 49)
The three types char, signed char, and
unsigned char are collectively called
the character types. The
implementation shall define char to
have the same range, representation,
and behavior as either signed char or
unsigned char.
And indeed gcc define that according to target platform.
My question is, why does the standard do that? I can see nothing that can come out of ambiguous type definition, except of hideous and hard to spot bugs.
More than so, in ANSI C (before C99), the only byte-sized type is char, so using char for math is sometimes inevitable. So saying "one should never use char for math" is not so true. If that was the case, a saner decision was to include three types "char
,ubyte
,sbyte
".
Is there a reason for that, or is it just some weird backwards-compatibility gotcha, in order to allow bad (but common) compilers to be defined as standard compatible?
"Plain" char having unspecified signed-ness allows compilers to select whichever representation is more efficient for the target architecture: on some architectures, zero extending a one-byte value to the size of "int" requires less operations (thus making plain char 'unsigned'), while on others the instruction set makes sign-extending more natural, and plain char gets implemented as signed.
Perhaps historically some implementations' "char" were signed and some were unsigned, and so to be compatible with both they couldn't define it as one or the other.
in those good old days C was defined, the character world was 7bit, so the sign-bit could be used for other things (like EOF)
On some machines, a signed char would be too small to hold all the characters in the C character set (letters, digits, standard punctuation, etc.) On such machines, 'char' must be unsigned. On other machines, an unsigned char can hold values larger than a signed int (since char and int are the same size). On those machines, 'char' must be signed.
I suppose (out of the top of my head) that their thinking was along the following lines:
If you care about the sign of char (using it as a byte) you should explicitly choose signed or unsigned char.