Why is char neither signed or unsigned, but wchar_

2019-04-26 17:12发布

问题:

The following C++ program compiles without errors:

void f(char){}
void f(signed char){}
void f(unsigned char){}
int main(){}  

The wchar_t version of the same program does not:

void f(wchar_t){}
void f(signed wchar_t){}
void f(unsigned wchar_t){}
int main(){}

error: redefinition of ‘void f(wchar_t)’
void f(signed wchar_t){}

It seems that wchar_t is unsigned.
Why is there an inconsistency in overloading?

回答1:

The chars are all distinct types and can be overloaded

[basic.fundamental] / 1

[...] Plain char, signed char, and unsigned char are three distinct types, collectively called narrow character types. [...]

wchar_t is also a distinct type, but it cannot be qualified with signed or unsigned, which can only be used with the standard integer types.

[dcl.type] / 2

As a general rule, at most one type-specifier is allowed in the complete decl-specifier-seq of a declaration or in a type-specifier-seq or trailing-type-specifier-seq. The only exceptions to this rule are the following:

[...]

signed or unsigned can be combined with char, long, short, or int.

[dcl.type.simple] / 2

[...] Table 9 summarizes the valid combinations of simple-type-specifiers and the types they specify.

The signedness of wchar_t is implementation defined:

[basic.fundamental] / 5

[...] Type wchar_t shall have the same size, signedness, and alignment requirements (3.11) as one of the other integral types, called its underlying type.



回答2:

char is a distinct type from both signed char and unsigned char. wchar_t is yet another distinct type (for type identity purposes), but which has exactly the same properties (size, signedness and alignment) as some other integral type.

From ISO 14882:2003, 3.9.1:

Plain char, signed char, and unsigned char are three distinct types.

(...)

Type wchar_t is a distinct type whose values can represent distinct codes for all members of the largest extended character set specified among the supported locales (22.1.1). Type wchar_t shall have the same size, signedness, and alignment requirements (3.9) as one of the other integral types, called its underlying type.

There is no such thing as signed wchar_t or unsigned wchar_t. It is not mentioned anywhere in the document.



回答3:

char is a fundamental type. wchar_t evolved as first a library solution (in C), and then became a built in type with an underlying type, corresponding to the type that earlier was used to typedef it:

C++11 $3.9.1/5

Type wchar_t shall have the same size, signedness, and alignment requirements (3.11) as one of the other integral types, called its underlying type.

This explains why you cannot change the signedness of wchar_t, but it does not explain why there is a char type with unspecified signedness.


Also, the choice of signed char that most compilers default to, is impractical for several reasons. One reason is that the negative values are annoying and generally have to be cast to unsigned in order to compare them. Another reason is that the C character classification functions require non-negative values (except when being passed EOF). A third reason is that on old magnitude-and-sign or one's complement machines there's one unusable value.

There may be some explanation of that in Stroustrup's “The design and evolution of C++”, but I doubt it.

It sounds like frozen history, something that at one point made some kind of sense, for the technology at the time.