I recently read that the differences between
char
unsigned char
and
signed char
is platform specific.
I can't quite get my head round this? does it mean the the bit sequence can vary from one platform to the next ie platform1 the sign is the first bit, platform2 the sign could be at the end? how would you code against this?
Basically my question comes from seeing this line:
typedef unsigned char byte;
I dont understand the relevance of the signage?
Perhaps you are referring to the fact that the signedness of
char
is compiler / platform specific. Here is a blog entry that sheds some light on it:Character types in C and C++
It's more correct to say that it's compiler-specific and you should not count on
char
being signed or unsigned when usingchar
without asigned
orunsigned
qualifier.Otherwise you would face the following problem: you write and debug the program assuming that
char
is signed by default and then it is recompiled with a compiler assuming otherwise and the program behaviour changes drastically. If you rely on this assumption only once in a while in your code you risk facing unintended behaviour in some cases which are only triggered in your program under specific conditions and are very hard to detect and debug.Having a signed char is more of a fluke of how all base variable types are handled in C, generally it is not actually useful to have negative characters.
a signed char is always 8 bit and has always the signed bit as the last bit.
an unsigned char is always 8 bit and doesn't have a sign bit.
a char is as far as I know always unsigned. Any compiler defaulting to a signed char will face a lot of incompatible programs.
You misunderstood something. signed char is always signed. unsigned char is always unsigned. But whether plain char is signed or unsigned is implementation specific - that means it depends on your compiler. This makes difference from int types, which all are signed (int is the same as signed int, short is the same as signed short). More interesting thing is that char, signed char and unsigned char are treated as three distinct types in terms of function overloading. It means that you can have in the same compilation unit three function overloads:
For int types is contrary, you can't have
because int and signed int is the same.
Let's assume that your platform has eight-bit bytes, and suppose we have the bit pattern
10101010
. To asigned char
, that value is −86. Forunsigned char
, though, that same bit pattern represents 170. We haven't moved any bits around; it's the same bits, interpreted two different ways.Now for
char
. The standard doesn't say which of those two interpretations should be correct. Achar
holding the bit pattern10101010
could be either −86 or 170. It's going to be one of those two values, but you have to know the compiler and the platform before you can predict which it will be. Some compilers offer a command-line switch to control which one it will be. Some compilers have different defaults depending on what OS they're running on, so they can match the OS convention.In most code, it really shouldn't matter. They are treated as three distinct types, for the purposes of overloading. Pointers to one of those types aren't compatible with pointers to another type. Try calling
strlen
with asigned char*
or anunsigned char*
; it won't work.Use
signed char
when you want a one-byte signed numeric type, and useunsigned char
when you want a one-byte unsigned numeric type. Use plain oldchar
when you want to hold characters. That's what the programmer was thinking when writing the typedef you're asking about. The name "byte" doesn't have the connotation of holding character data, whereas the name "unsigned char" has the word "char" in its name, and that causes some people to think it's a good type for holding characters, or that it's a good idea to compare it with variables of typechar
.Since you're unlikely to do general arithmetic on characters, it won't matter whether
char
is signed or unsigned on any of the platforms and compilers you use.