What does it mean for a char to be signed?

2019-03-10 05:59发布

问题:

Given that signed and unsigned ints use the same registers, etc., and just interpret bit patterns differently, and C chars are basically just 8-bit ints, what's the difference between signed and unsigned chars in C? I understand that the signedness of char is implementation defined, and I simply can't understand how it could ever make a difference, at least when char is used to hold strings instead of to do math.

回答1:

It won't make a difference for strings. But in C you can use a char to do math, when it will make a difference.

In fact, when working in constrained memory environments, like embedded 8 bit applications a char will often be used to do math, and then it makes a big difference. This is because there is no byte type by default in C.



回答2:

In terms of the values they represent:

unsigned char:

  • spans the value range 0..255 (00000000..11111111)
  • values overflow around low edge as:

    0 - 1 = 255 (00000000 - 00000001 = 11111111)

  • values overflow around high edge as:

    255 + 1 = 0 (11111111 + 00000001 = 00000000)

  • bitwise right shift operator (>>) does a logical shift:

    10000000 >> 1 = 01000000 (128 / 2 = 64)

signed char:

  • spans the value range -128..127 (10000000..01111111)
  • values overflow around low edge as:

    -128 - 1 = 127 (10000000 - 00000001 = 01111111)

  • values overflow around high edge as:

    127 + 1 = -128 (01111111 + 00000001 = 10000000)

  • bitwise right shift operator (>>) does an arithmetic shift:

    10000000 >> 1 = 11000000 (-128 / 2 = -64)

I included the binary representations to show that the value wrapping behaviour is pure, consistent binary arithmetic and has nothing to do with a char being signed/unsigned (expect for right shifts).

Update

Some implementation-specific behaviour mentioned in the comments:

  • char != signed char. The type "char" without "signed" or "unsinged" is implementation-defined which means that it can act like a signed or unsigned type.
  • Signed integer overflow leads to undefined behavior where a program can do anything, including dumping core or overrunning a buffer.


回答3:

#include <stdio.h>

int main(int argc, char** argv)
{
    char a = 'A';
    char b = 0xFF;
    signed char sa = 'A';
    signed char sb = 0xFF;
    unsigned char ua = 'A';
    unsigned char ub = 0xFF;
    printf("a > b: %s\n", a > b ? "true" : "false");
    printf("sa > sb: %s\n", sa > sb ? "true" : "false");
    printf("ua > ub: %s\n", ua > ub ? "true" : "false");
    return 0;
}


[root]# ./a.out
a > b: true
sa > sb: true
ua > ub: false

It's important when sorting strings.



回答4:

There are a couple of difference. Most importantly, if you overflow the valid range of a char by assigning it a too big or small integer, and char is signed, the resulting value is implementation defined or even some signal (in C) could be risen, as for all signed types. Contrast that to the case when you assign something too big or small to an unsigned char: the value wraps around, you will get precisely defined semantics. For example, assigning a -1 to an unsigned char, you will get an UCHAR_MAX. So whenever you have a byte as in a number from 0 to 2^CHAR_BIT, you should really use unsigned char to store it.

The sign also makes a difference when passing to vararg functions:

char c = getSomeCharacter(); // returns 0..255
printf("%d\n", c);

Assume the value assigned to c would be too big for char to represent, and the machine uses two's complement. Many implementation behave for the case that you assign a too big value to the char, in that the bit-pattern won't change. If an int will be able to represent all values of char (which it is for most implementations), then the char is being promoted to int before passing to printf. So, the value of what is passed would be negative. Promoting to int would retain that sign. So you will get a negative result. However, if char is unsigned, then the value is unsigned, and promoting to an int will yield a positive int. You can use unsigned char, then you will get precisely defined behavior for both the assignment to the variable, and passing to printf which will then print something positive.

Note that a char, unsigned and signed char all are at least 8 bits wide. There is no requirement that char is exactly 8 bits wide. However, for most systems that's true, but for some, you will find they use 32bit chars. A byte in C and C++ is defined to have the size of char, so a byte in C also is not always exactly 8 bits.

Another difference is, that in C, a unsigned char must have no padding bits. That is, if you find CHAR_BIT is 8, then an unsigned char's values must range from 0 .. 2^CHAR_BIT-1. THe same is true for char if it's unsigned. For signed char, you can't assume anything about the range of values, even if you know how your compiler implements the sign stuff (two's complement or the other options), there may be unused padding bits in it. In C++, there are no padding bits for all three character types.



回答5:

"What does it mean for a char to be signed?"

Traditionally, the ASCII character set consists of 7-bit character encodings. (As opposed to the 8 bit EBCIDIC.)

When the C language was designed and implemented this was a significant issue. (For various reasons like data transmission over serial modem devices.) The extra bit has uses like parity.

A "signed character" happens to be perfect for this representation.

Binary data, OTOH, is simply taking the value of each 8-bit "chunk" of data, thus no sign is needed.



回答6:

Arithmetic on bytes is important for computer graphics (where 8-bit values are often used to store colors). Aside from that, I can think of two main cases where char sign matters:

  • converting to a larger int
  • comparison functions

The nasty thing is, these won't bite you if all your string data is 7-bit. However, it promises to be an unending source of obscure bugs if you're trying to make your C/C++ program 8-bit clean.



回答7:

Signedness works pretty much the same way in chars as it does in other integral types. As you've noted, chars are really just one-byte integers. (Not necessarily 8-bit, though! There's a difference; a byte might be bigger than 8 bits on some platforms, and chars are rather tied to bytes due to the definitions of char and sizeof(char). The CHAR_BIT macro, defined in <limits.h> or C++'s <climits>, will tell you how many bits are in a char.).

As for why you'd want a character with a sign: in C and C++, there is no standard type called byte. To the compiler, chars are bytes and vice versa, and it doesn't distinguish between them. Sometimes, though, you want to -- sometimes you want that char to be a one-byte number, and in those cases (particularly how small a range a byte can have), you also typically care whether the number is signed or not. I've personally used signedness (or unsignedness) to say that a certain char is a (numeric) "byte" rather than a character, and that it's going to be used numerically. Without a specified signedness, that char really is a character, and is intended to be used as text.

I used to do that, rather. Now the newer versions of C and C++ have (u?)int_least8_t (currently typedef'd in <stdint.h> or <cstdint>), which are more explicitly numeric (though they'll typically just be typedefs for signed and unsigned char types anyway).



回答8:

The only situation I can imagine this being an issue is if you choose to do math on chars. It's perfectly legal to write the following code.

char a = (char)42;
char b = (char)120;
char c = a + b;

Depending on the signedness of the char, c could be one of two values. If char's are unsigned then c will be (char)162. If they are signed then it will an overflow case as the max value for a signed char is 128. I'm guessing most implementations would just return (char)-32.



回答9:

One thing about signed chars is that you can test c >= ' ' (space) and be sure it's a normal printable ascii char. Of course, it's not portable, so not very useful.