Why does C allow accessing object using "character type":
6.5 Expressions (C)
An object shall have its stored value accessed only by an lvalue expression that has one ofthe following types:
- a character type.
but C++ only allows char and unsigned char?
3.10 Lvalues and rvalues (C++)
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
- a char or unsigned char type.
Another portion of signed char hatred (quote from C++ standard):
3.9 Types (C++)
For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes making up the object can be copied into an array of char or unsigned char. If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.
And from C standard:
6.2.6 Representations of types (C)
Values stored in non-bit-field objects of any other object type consist of n × CHAR_BIT bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [n] (e.g., by memcpy); the resulting set of bytes is called the object representation of the value.
I can see many people on stackoverflow saying that is because unsigned char is the only character type that guaranteed to not have padding bits, but C99 Section 6.2.6.2 Integer types says
signed char shall not have any padding bits
So what is the real reason behind this?
Here's my take on the motivation:
On a non-twos-complement system,
signed char
will not be suitable for accessing the representation of an object. This is because either there are two possiblesigned char
representations which have the same value (+0 and -0), or one representation that has no value (a trap representation). In either case, this prevents you from doing most meaningful things you might do with the representation of an object. For example, if you have a 16-bit unsigned integer0x80ff
, one or the other byte, as asigned char
, is going to either trap or compare equal to 0.Note that on such an implementation (non-twos-complement), plain
char
needs to be defined as an unsigned type for accessing the representations of objects viachar
to work correctly. While there's no explicit requirement, I see this as a requirement derived from other requirements in the standard.I think what you're really asking is why
signed char
is disqualified from all the rules allowing type-punning tochar*
as a special case. To be honest, I don't know, especially since — as far as I can tell —signed char
cannot have padding either:Empirical evidence suggests that it's not much more than convention:
char
is seen as a byte of ASCII;unsigned char
is seen as a byte with arbitrary "binary" content; andsigned char
is left flapping in the wind.To me, it doesn't seem like enough of a reason to exclude it from these standard rules, but I honestly can't find any evidence to the contrary. I'm going to put it down to a mildly inexplicable oddity in the standard wording.
(It may be that we have to ask the
std-discussion
list about this.)The use of a character type to inspect the representations of objects is a hack. However, it is historical, and some accommodation must be made to allow it.
Mostly, in programming languages, we want strong typing. Something that is a
float
should be accessed as afloat
and not as anint
. This has a number of benefits, including reducing human errors and enabling various optimizations.However, there are times when it is necessary to access or modify the bytes of an object. In C, this was done through character types. C++ continues that tradition, but it improves the situation slightly by eliminating the use of
signed char
for these purposes.Ideally, it might have been better to create a new type, say
byte
, and to allow byte access to object representations only through this type, thus separating the regular character types only for use as normal integers/characters. Perhaps it was thought there was too much existing code usingchar
andunsigned char
to support such a change. However, I have never seensigned char
used to access the representation of an object, so it was safe to exclude it.