Initializing an object to all zeroes

2020-01-29 06:27发布

站内文章 / C++

49 0

祖国的老花朵

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Oftentimes data structures' valid initialization is to set all members to zero. Even when programming in C++, one may need to interface with an external API for which this is the case.

Is there any practical difference between:

some_struct s;
memset(&s, 0, sizeof(s));

and simply

some_struct s = { 0 };

Do folks find themselves using both, with a method for choosing which is more appropriate for a given application? (Hopefully it is understood that this is only currently applicable to POD structures; you'd get all sorts of havoc if there was a C++ std::string in that structure.)

For myself, as mostly a C++ programmer who doesn't use memset much, I'm never certain of the function signature so I find the second example is just easier to use in addition to being less typing, more compact, and maybe even more obvious since it says "this object is initialized to zero" right in the declaration rather than waiting for the next line of code and seeing, "oh, this object is zero initialized."

When creating classes and structs in C++ I tend to use initialization lists; I'm curious about folks thoughts on the two "C style" initializations above rather than a comparison against what is available in C++ since I suspect many of us interface with C libraries even if we code mostly in C++ ourselves.

Edit: Neil Butterworth posed this question, in followup, that I believe is an interesting corollary to this question.

回答1:

memset is practically never the right way to do it. And yes, there is a practical difference (see below).

In C++ not everything can be initialized with literal 0 (objects of enum types can't be), which is why in C++ the common idiom is

some_struct s = {};

while in C the idiom is

some_struct s = { 0 };

Note, that in C the = { 0 } is what can be called the universal zero initializer. It can be used with objects of virtually any type, since the {}-enclosed initializers are allowed with scalar objects as well

int x = { 0 }; /* legal in C (and in C++) */

which makes the = { 0 } useful in generic type-independent C code (type-independent macros for example).

The drawback of = { 0 } initializer in C89/90 and C++ is that it can only be used as a part of declaration. (C99 fixed this problem by introducing compound literals. Similar functionality is coming to C++ as well.) For this reason you might see many programmers use memset in order to zero something out in the middle of C89/90 or C++ the code. Yet, I'd say that the proper way to do is still without memset but rather with something like

some_struct s;
...
{
  const some_struct ZERO = { 0 };  
  s = ZERO;
}
...

i.e. by introducing a "fictive" block in the middle of the code, even though it might not look too pretty at the first sight. Of course, in C++ there's no need to introduce a block.

As for the practical difference... You might hear some people say that memset will produce the same results in practice, since in practice the physical all-zero bit pattern is what is used to represent zero values for all types. However, this is generally not true. An immediate example that would demonstrate the difference in a typical C++ implementation is a pointer-to-data-member type

struct S;
...

int S::*p = { 0 };
assert(p == NULL); // this assertion is guaranteed to hold

memset(&p, 0, sizeof p);
assert(p == NULL); // this assertion will normally fail

This happens because a typical implementation usually uses the all-one bit pattern (0xFFFF...) to represent the null pointer of this type. The above example demonstrates a real-life practical difference between a zeroing memset and a normal = { 0 } initializer.

回答2:

some_struct s = { 0 }; is guaranteed to work; memset relies on implementation details and is best avoided.

回答3:

If the struct contains pointers, the value of all bits zero as produced by memset may not mean the same as assigning a 0 to it in the C (or C++) code, i.e. a NULL pointer.

(It might also be the case with floats and doubles, but that I've never encountered. However, I don't think the standards guarantee them to become zero with memset either.)

Edit: From a more pragmatic perspective, I'd still say to not use memset when possible to avoid, as it is an additional function call, longer to write, and (in my opinion) less clear in intent than = { 0 }.

回答4:

Depending on the compiler optimization, there may be some threshold above which memset is faster, but that would usually be well above the normal size of stack based variables. Using memset on a C++ object with a virtual table is of course bad.

回答5:

I found a good solution to be:

template<typename T> void my_zero(T& e) {
    static T dummy_zero_object;
    e = dummy_zero_object;
}

my_zero(s);

This does the right thing not only for fundamental types and user-defined types, but it also zero-initializes types for which the default constructor is defined but does not initialize all member variables --- especially classes containing non-trivial union members.

回答6:

The only practical difference is that the ={0}; syntax is a bit clearer about saying "initialize this to be empty" (at least it seems clearer to me).

Purely theoretically, there are a few situations in which memset could fail, but as far as I know, they really are just that: theoretical. OTOH, given that it's inferior from both a theoretical and a practical viewpoint, I have a hard time figuring out why anybody would want to use memset for this task.

回答7:

I've never understood the mysterious goodness of setting everything to zero, which even if it is defined seems unlikely to be desirable. As this is tagged as C++, the correct solution to initialisation is to give the struct or class a construtor.

回答8:

Hopefully it is understood that this is only currently available for POD structures; you'd get a compiler error if there was a C++ std::string in that structure.

No you won't. If you use memset on such, at the best you will just crash, and at the worst you get some gibberish. The = { } way can be used perfectly fine on non-POD structs, as long as they are aggregates. The = { } way is the best way to take in C++. Please note that there is no reason in C++ to put that 0 in it, nor is it recommended, since it drastically reduces the cases in which it can be used

struct A {
  std::string a;
  int b;
};

int main() {
  A a = { 0 };
  A a = { };
}

The first will not do what you want: It will try to create a std::string from a C-string given a null pointer to its constructor. The second, however, does what you want: It creates an empty string.

回答9:

I think the initialization speaks much clearer what you actually are doing. You are initializing the struct. When the new standard is out that way of initializing will get even more used (initializing containers with {} is something to look forward to). The memset way are slightly more error prone, and does not communicate that clearly what you are doing. That might not account for much while programming alone, but means a great deal when working in a team.

For some people working with c++, memset, malloc & co. are quite esoteric creatures. I have encountered a few myself.

回答10:

The best method for clearing structures is to set each field individually:

struct MyStruct
{
  std::string name;
  int age;
  double checking_account_balance;
  void clear(void)
  {
     name.erase();
     age = 0;
     checking_account_balance = 0.0;
  }
};

In the above example, a clear method is defined to set all the members to a known state or value. The memset and std::fill methods may not work due to std::string and double types. A more robust program clears each field individually.

I prefer having a more robust program than spending less time typing.

回答11:

The bzero function is another option.

#include <strings.h>
void bzero(void *s, size_t n);

回答12:

In C I prefer using {0,} to the equivalent memset(). However gcc warns about this usage :( Details here: http://www.pixelbeat.org/programming/gcc/auto_init.html

In C++ they're usually equivalent, but as always with C++ there are corner cases to consider (noted in other answers).