According to C11 WG14 draft version N1570:
The header
<ctype.h>
declares several functions useful for classifying and mapping characters. In all cases the argument is anint
, the value of which shall be representable as anunsigned char
or shall equal the value of the macroEOF
. If the argument has any other value, the behavior is undefined.
Is it undefined behaviour?:
#include <ctype.h>
#include <limits.h>
#include <stdlib.h>
int main(void) {
char c = CHAR_MIN; /* let assume that char is signed and CHAR_MIN < 0 */
return isspace(c) ? EXIT_FAILURE : EXIT_SUCCESS;
}
Does the standard allow to pass char
to isspace()
(char
to int
)? In other words, is char
after conversion to int
representable as an unsigned char
?
Here's how wiktionary defines "representable":
Capable of being represented.
Is char
capable of being represented as unsigned char
? Yes. §6.2.6.1/4:
Values stored in non-bit-field objects of any other object type consist of n
×
CHAR_BIT
bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [n] (e.g., by memcpy); the resulting set of bytes is called the object representation of the value.
sizeof(char) == 1
therefore its object representation is unsigned char[1]
i.e., char
is capable of being represented as an unsigned char
. Where am I wrong?
Concrete example, I can represent [-2, -1, 0, 1]
as [0, 1, 2, 3]
. If I can't then why?
Related: According to §6.3.1.3 isspace((unsigned char)c)
is portable if INT_MAX >= UCHAR_MAX
otherwise it is implementation-defined.
Under the assumption that char is signed then this would be undefined behavior, otherwise it is well defined since
CHAR_MIN
would have the value0
. It is easier to see the intention and meaning of:if we read section
7.4
Character handling <ctype.h> from the Rationale for International Standard—Programming Languages—C which says (emphasis mine going forward):So valid values are:
EOF
which is some implementation defined negative numberEven though this is C99 rationale since the particular wording you are referring to does not change from C99 to C11 and so the rationale still fits.
We can also find why the interface uses int as an argument as opposed to char, from section
7.1.4
Use of library functions, it says:Re-formulated, a type is a convention for what the underlying bit-patterns mean. A value is thus representable in a type, if that type assigns some bit-pattern that meaning.
A conversion (which might need a cast), is a mapping from a value (represented with a specific type) to a value (possibly different) represented in the target type.
Under the given assumption (that
char
is signed),CHAR_MIN
is certainly negative, and the text you quoted leaves no room for interpretation:Yes, it is undefined behavior, as
unsigned char
cannot represent any negative numbers.If that assumption did not hold, your program would be well-defined, because
CHAR_MIN
would be0
, a valid value forunsigned char
.Thus, we have a case where it is implementation-defined whether the program is undefined or well-defined.
As an aside, there is no guarantee that
sizeof(int)>1
orINT_MAX >= CHAR_MAX
, soint
might not be able to represent all values possible forunsigned char
.As conversions are defined to be value-preserving, a signed
char
can always be converted toint
.But if it was negative, that does not change the impossibility of representing a negative value as an
unsigned char
. (The conversion is defined, as conversion from any integral type to anyunsigned
integral type is always defined, though narrowing conversions need a cast.)The revealing quote (for me) is §6.3.1.3/1:
i.e., if the value has to be changed then the value can't be represented by the new type.
Therefore an
unsigned
type can't represent a negative value.To answer the question in the title: "representable" refers to "can be represented" from §6.3.1.3 and unrelated to "object representation" from §6.2.6.1.
It seems trivial in retrospect. I might have been confused by the habit of treating
b'\xFF'
,0xff
,255
,-1
as the same byte in Python:and the disbelief that it is an undefined behavior to pass a character to a character classification function e.g.,
isspace(CHAR_MIN)
.