int main()
{
char c = 0xff;
bool b = 0xff == c;
// Under most C/C++ compilers' default options, b is FALSE!!!
}
Neither the C or C++ standard specify char as signed or unsigned, it is implementation-defined.
Why does the C/C++ standard not explicitly define char as signed or unsigned for avoiding dangerous misuses like the above code?
Historical reasons, mostly.
Expressions of type
char
are promoted toint
in most contexts (because a lot of CPUs don't have 8-bit arithmetic operations). On some systems, sign extension is the most efficient way to do this, which argues for making plainchar
signed.On the other hand, the EBCDIC character set has basic characters with the high-order bit set (i.e., characters with values of 128 or greater); on EBCDIC platforms,
char
pretty much has to be unsigned.The ANSI C Rationale (for the 1989 standard) doesn't have a lot to say on the subject; section 3.1.2.5 says:
Going back even further, an early version of the C Reference Manual from 1975 says:
This description is more implementation-specific than what we see in later documents, but it does acknowledge that
char
may be either signed or unsigned. On the "other implementations" on which "the sign-propagation disappears", the promotion of achar
object toint
would have zero-extended the 8-bit representation, essentially treating it as an 8-bit unsigned quantity. (The language didn't yet have thesigned
orunsigned
keyword.)C's immediate predecessor was a language called B. B was a typeless language, so the question of
char
being signed or unsigned did not apply. For more information about the early history of C, see the late Dennis Ritchie'shome page, now moved here.As for what's happening in your code (applying modern C rules):
If plain
char
is unsigned, then the initialization ofc
sets it to(char)0xff
, which compares equal to0xff
in the second line. But if plainchar
is signed, then0xff
(an expression of typeint
) is converted tochar
-- but since0xff
exceeds CHAR_MAX (assumingCHAR_BIT==8
), the result is implementation-defined. In most implementations, the result is-1
. In the comparison0xff == c
, both operands are converted toint
, making it equivalent to0xff == -1
, or255 == -1
, which is of course false.Another important thing to note is that
unsigned char
,signed char
, and (plain)char
are three distinct types.char
has the same representation as eitherunsigned char
orsigned char
; it's implementation-defined which one it is. (On the other hand,signed int
andint
are two names for the same type;unsigned int
is a distinct type. (Except that, just to add to the frivolity, it's implementation-defined whether a bit field declared as plainint
is signed or unsigned.))Yes, it's all a bit of a mess, and I'm sure it would have be defined differently if C were being designed from scratch today. But each revision of the C language has had to avoid breaking (too much) existing code, and to a lesser extent existing implementations.
char
at first is meant to store characters, so whether it's signed or unsigned is not important. What really matters is how to perform maths onchar
efficiently. So depend on the system, the compiler will choose what's most appropriateIn fact a lot of ARM compilers still use
unsigned char
by default, because even if you can load a byte with sign extension on modern ARM ISAs, that instruction is still less flexible than the zero extension versionchar
isunsigned
by default on Android NDKAnd most modern compilers also allow you to change char's signness instead of using the default setting