What does “representable” mean in C11?

2020-02-01 07:54发布

According to C11 WG14 draft version N1570:

The header <ctype.h> declares several functions useful for classifying and mapping characters. In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.

Is it undefined behaviour?:

#include <ctype.h>
#include <limits.h>
#include <stdlib.h>

int main(void) {
  char c = CHAR_MIN; /* let assume that char is signed and CHAR_MIN < 0 */
  return isspace(c) ? EXIT_FAILURE : EXIT_SUCCESS;
}

Does the standard allow to pass char to isspace() (char to int)? In other words, is char after conversion to int representable as an unsigned char?


Here's how wiktionary defines "representable":

Capable of being represented.

Is char capable of being represented as unsigned char? Yes. §6.2.6.1/4:

Values stored in non-bit-field objects of any other object type consist of n × CHAR_BIT bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [n] (e.g., by memcpy); the resulting set of bytes is called the object representation of the value.

sizeof(char) == 1 therefore its object representation is unsigned char[1] i.e., char is capable of being represented as an unsigned char. Where am I wrong?

Concrete example, I can represent [-2, -1, 0, 1] as [0, 1, 2, 3]. If I can't then why?


Related: According to §6.3.1.3 isspace((unsigned char)c) is portable if INT_MAX >= UCHAR_MAX otherwise it is implementation-defined.

3条回答
Fickle 薄情
2楼-- · 2020-02-01 08:28

Under the assumption that char is signed then this would be undefined behavior, otherwise it is well defined since CHAR_MIN would have the value 0. It is easier to see the intention and meaning of:

the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF

if we read section 7.4 Character handling <ctype.h> from the Rationale for International Standard—Programming Languages—C which says (emphasis mine going forward):

Since these functions are often used primarily as macros, their domain is restricted to the small positive integers representable in an unsigned char, plus the value of EOF. EOF is traditionally -1, but may be any negative integer, and hence distinguishable from any valid character code. These macros may thus be efficiently implemented by using the argument as an index into a small array of attributes.

So valid values are:

  1. Positive integers that can fit into unsigned char
  2. EOF which is some implementation defined negative number

Even though this is C99 rationale since the particular wording you are referring to does not change from C99 to C11 and so the rationale still fits.

We can also find why the interface uses int as an argument as opposed to char, from section 7.1.4 Use of library functions, it says:

All library prototypes are specified in terms of the “widened” types an argument formerly declared as char is now written as int. This ensures that most library functions can be called with or without a prototype in scope, thus maintaining backwards compatibility with pre-C89 code. Note, however, that since functions like printf and scanf use variable-length argument lists, they must be called in the scope of a prototype.

查看更多
Ridiculous、
3楼-- · 2020-02-01 08:35

What does representable in a type mean?

Re-formulated, a type is a convention for what the underlying bit-patterns mean. A value is thus representable in a type, if that type assigns some bit-pattern that meaning.

A conversion (which might need a cast), is a mapping from a value (represented with a specific type) to a value (possibly different) represented in the target type.


Under the given assumption (that char is signed), CHAR_MIN is certainly negative, and the text you quoted leaves no room for interpretation:
Yes, it is undefined behavior, as unsigned char cannot represent any negative numbers.

If that assumption did not hold, your program would be well-defined, because CHAR_MIN would be 0, a valid value for unsigned char.

Thus, we have a case where it is implementation-defined whether the program is undefined or well-defined.


As an aside, there is no guarantee that sizeof(int)>1 or INT_MAX >= CHAR_MAX, so int might not be able to represent all values possible for unsigned char.

As conversions are defined to be value-preserving, a signed char can always be converted to int.
But if it was negative, that does not change the impossibility of representing a negative value as an unsigned char. (The conversion is defined, as conversion from any integral type to any unsigned integral type is always defined, though narrowing conversions need a cast.)

查看更多
家丑人穷心不美
4楼-- · 2020-02-01 08:41

The revealing quote (for me) is §6.3.1.3/1:

if the value can be represented by the new type, it is unchanged.

i.e., if the value has to be changed then the value can't be represented by the new type.

Therefore an unsigned type can't represent a negative value.

To answer the question in the title: "representable" refers to "can be represented" from §6.3.1.3 and unrelated to "object representation" from §6.2.6.1.

It seems trivial in retrospect. I might have been confused by the habit of treating b'\xFF', 0xff, 255, -1 as the same byte in Python:

>>> (255).to_bytes(1, 'big')
b'\xff'
>>> int.from_bytes(b'\xFF', 'big')
255
>>> 255 == 0xff
True
>>> (-1).to_bytes(1, 'big', signed=True)
b'\xff'

and the disbelief that it is an undefined behavior to pass a character to a character classification function e.g., isspace(CHAR_MIN).

查看更多
登录 后发表回答