Is it safe to memset bool to 0?

2019-01-09 02:18发布

问题:

Suppose I have some legacy code which cannot be changed unless a bug is discovered, and it contains this code:

bool data[32];
memset(data, 0, sizeof(data));

Is this a safe way to set all bool in the array to a false value?

More generally, is it safe to memset a bool to 0 in order to make its value false?

Is it guaranteed to work on all compilers? Or do I to request a fix?

回答1:

I believe this unspecified although it seems likely the underlying representation of false would be all zeros. Boost.Container relies on this as well (emphasis mine):

Boost.Container uses std::memset with a zero value to initialize some types as in most platforms this initialization yields to the desired value initialization with improved performance.

Following the C11 standard, Boost.Container assumes that for any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type. Since _Bool/wchar_t/char16_t/char32_t are also integer types in C, it considers all C++ integral types as initializable via std::memset.

This C11 quote they they point to as a rationale actually comes from a C99 defect: defect 263: all-zero bits representations which added the following:

For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type.

So then the question here is the assumption correct, are the underlying object representation for integer compatible between C and C++? The proposal Resolving the difference between C and C++ with regards to object representation of integers sought to answer this to some extent which as far as I can tell was not resolved. I can not find conclusive evidence of this in the draft standard. We have a couple of cases where it links to the C standard explicitly with respect to types. Section 3.9.1 [basic.fundamental] says:

[...] The signed and unsigned integer types shall satisfy the constraints given in the C standard, section 5.2.4.2.1.

and 3.9 [basic.types] which says:

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object is the set of bits that hold the value of type T. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.44

where footnote 44(which is not normative) says:

The intent is that the memory model of C++ is compatible with that of ISO/IEC 9899 Programming Language C.

The farthest the draft standard gets to specifying the underlying representation of bool is in section 3.9.1:

Types bool, char, char16_t, char32_t, wchar_t, and the signed and unsigned integer types are collectively called integral types.50 A synonym for integral type is integer type. The representations of integral types shall define values by use of a pure binary numeration system.51 [ Example: this International Standard permits 2’s complement, 1’s complement and signed magnitude representations for integral types. —end example ]

the section also says:

Values of type bool are either true or false.

but all we know of true and false is:

The Boolean literals are the keywords false and true. Such literals are prvalues and have type bool.

and we know they are convertible to 0 an 1:

A prvalue of type bool can be converted to a prvalue of type int, with false becoming zero and true becoming one.

but this gets us no closer to the underlying representation.

As far as I can tell the only place where the standard references the actual underlying bit value besides padding bits was removed via defect report 1796: Is all-bits-zero for null characters a meaningful requirement? :

It is not clear that a portable program can examine the bits of the representation; instead, it would appear to be limited to examining the bits of the numbers corresponding to the value representation (3.9.1 [basic.fundamental] paragraph 1). It might be more appropriate to require that the null character value compare equal to 0 or '\0' rather than specifying the bit pattern of the representation.

There are more defect reports that deal with the gaps in the standard with respect to what is a bit and difference between the value and object representation.

Practically, I would expect this to work, I would not consider it safe since we can not nail this down in the standard. Do you need to change it, not clear, you clearly have a non-trivial trade-off involved. So assuming it works now the question is do we consider it likely to break with future versions of various compilers, that is unknown.



回答2:

Is it guaranteed by the law? No.

C++ says nothing about the representation of bool values.

Is it guaranteed by practical reality? Yes.

I mean, if you wish to find a C++ implementation that does not represent boolean false as a sequence of zeroes, I shall wish you luck. Given that false must implicitly convert to 0, and true must implicitly convert to 1, and 0 must implicitly convert to false, and non-0 must implicitly convert to true … well, you'd be silly to implement it any other way.

Whether that means it's "safe" is for you to decide.

I don't usually say this, but if I were in your situation I would be happy to let this slide. If you're really concerned, you can add a test executable to your distributable to validate the precondition on each target platform before installing the real project.



回答3:

No. It is not safe (or more specifically, portable). However, it likely works by virtue of the fact that your typical implementation will:

  1. use 0 to represent a boolean (actually, the C++ specification requires it)
  2. generate an array of elements that memset() can deal with.

However, best practice would dictate using bool data[32] = {false} - additionally, this will likely free the compiler up to internally represent the structure differently - since using memset() could result in it generating a 32 byte array of values rather than, say, a single 4 byte that will fit nicely within your average CPU register.



回答4:

From 3.9.1/7:

Types bool , char , char16_t , char32_t , wchar_t , and the signed and unsigned integer types are collectively called integral types. A synonym for integral type is integer type . The representations of integral types shall define values by use of a pure binary numeration system.

Given this I can't see any possible implementation of bool that wouldn't represent false as all 0 bits.