In C++, sizeof('a') == sizeof(char) == 1
. This makes intuitive sense, since 'a'
is a character literal, and sizeof(char) == 1
as defined by the standard.
In C however, sizeof('a') == sizeof(int)
. That is, it appears that C character literals are actually integers. Does anyone know why? I can find plenty of mentions of this C quirk but no explanation for why it exists.
I haven't seen a rationale for it (C char literals being int types), but here's something Stroustrup had to say about it (from Design and Evolution 11.2.1 - Fine-Grain Resolution):
So for the most part, it should cause no problems.
The historical reason for this is that C, and its predecessor B, were originally developed on various models of DEC PDP minicomputers with various word sizes, which supported 8-bit ASCII but could only perform arithmetic on registers. (Not the PDP-11, however; that came later.) Early versions of C defined
int
to be the native word size of the machine, and any value smaller than anint
needed to be widened toint
in order to be passed to or from a function, or used in a bitwise, logical or arithmetic expression, because that was how the underlying hardware worked.That is also why the integer promotion rules still say that any data type smaller than an
int
is promoted toint
. C implementations are also allowed to use one’s-complement math instead of two’s-complement for similar historical reasons. The reason that octal character escapes and octal constants are first-class citizens compared to hex is likewise that those early DEC minicomputers had word sizes divisible into three-byte chunks but not four-byte nibbles.I remember reading K&R and seeing a code snippet that would read a character at a time until it hit EOF. Since all characters are valid characters to be in a file/input stream, this means that EOF cannot be any char value. What the code did was to put the read character into an int, then test for EOF, then convert to a char if it wasn't.
I realize this doesn't exactly answer your question, but it would make some sense for the rest of the character literals to be sizeof(int) if the EOF literal was.
I don't know the specific reasons why a character literal in C is of type int. But in C++, there is a good reason not to go that way. Consider this:
You would expect that the call to print selects the second version taking a char. Having a character literal being an int would make that impossible. Note that in C++ literals having more than one character still have type int, although their value is implementation defined. So,
'ab'
has typeint
, while'a'
has typechar
.I don't know, but I'm going to guess it was easier to implement it that way and it didn't really matter. It wasn't until C++ when the type could determine which function would get called that it needed to be fixed.
I didn't know this indeed. Before prototypes existed, anything narrower than an int was converted to an int when using it as a function argument. That may be part of the explanation.