Why compilers put zeros into arrays while they do

2020-01-20 02:57发布

问题:

I'm trying to understand when compilers should value initialize arrays and when they should default initialize it. I'm trying two options: one raw array, another array aggregated in a struct:

const int N = 1000;

struct A 
{
  uint32_t arr[N];

  A() = default;
};

void print(uint32_t* arr, const std::string& message)
{
  std::cout << message << ": " << 
    (std::count(arr, arr + N, 0) == N ? "all zeros" : "garbage") << std::endl;
}

int main()
{
  uint32_t arrDefault[N];
  print(arrDefault, "Automatic array,  default initialization");

  uint32_t arrValue[N] = {};
  print(arrValue, "Automatic array,  value   initialization");

  uint32_t* parrDefault = new uint32_t[N];
  print(parrDefault, "  Dynamic array,  default initialization");

  uint32_t* parrValue = new uint32_t[N]();
  print(parrValue, "  Dynamic array,  value   initialization");

  A structDefault;
  print(structDefault.arr, "Automatic struct, default initialization");

  A structValue{};
  print(structValue.arr, "Automatic struct, value   initialization");

  A* pstructDefault = new A;
  print(pstructDefault->arr, "  Dynamic struct, default initialization");

  A* psstructValue = new A();
  print(psstructValue->arr, "  Dynamic struct, value   initialization");
}

Here is what I see for clang and VC++:

Automatic array,  default initialization: garbage
Automatic array,  value   initialization: all zeros
  Dynamic array,  default initialization: garbage
  Dynamic array,  value   initialization: all zeros
Automatic struct, default initialization: all zeros
Automatic struct, value   initialization: all zeros
  Dynamic struct, default initialization: garbage
  Dynamic struct, value   initialization: all zeros

Output for gcc is different only in the first line, where it also puts "all zeros".

From my point of view they are all wrong, and what I expect is:

Automatic array,  default initialization: garbage
Automatic array,  value   initialization: all zeros
  Dynamic array,  default initialization: garbage
  Dynamic array,  value   initialization: all zeros
Automatic struct, default initialization: garbage
Automatic struct, value   initialization: garbage
  Dynamic struct, default initialization: garbage
  Dynamic struct, value   initialization: garbage

I.e. output is ok for raw arrays (except for gcc): we have garbage for default and zeros for value. Great. But for a struct I would expect to have garbage all the time. From default initialization:

Default initialization is performed in three situations:

  1. ...
  2. ...
  3. when a base class or a non-static data member is not mentioned in a constructor initializer list and that constructor is called.

The effects of default initialization are:

  • if T is a non-POD (until C++11) class type, ...
  • if T is an array type, every element of the array is default-initialized;
  • otherwise, nothing is done: the objects with automatic storage duration (and their subobjects) are initialized to indeterminate values.

In my example I have non-static data member that is not mentioned in a constructor initializer list, which is an array of POD type. I expect it to be left with indeterminate values, no matter how my struct is constructed.

My questions are:

  • Why does compilers violate that? I mean, why they put zeros when they do not have to, wasting my runtime? Am I wrong in my readings?
  • How can I enforce such behavior to make sure I do not waste my runtime populating arrays with zeros?
  • Why gcc performs value initialization for an automatic array?

回答1:

A structValue{}; is aggregate initialization, so 0 are guaranteed.

As A has no user provided constructor because explicitly defaulted constructors do not count as such, the same applies for value initialization as in A* psstructValue = new A();.

For the default initialization cases: Reading uninitialized variables is UB, and Undefined behavior is undefined. The compiler can do with that whatever it wants. Showing you 0 is just as legal as crashing. Maybe there even were 0 in the memory you read by chance. Maybe the compilers felt like 0 initializing. Both equally fine from the standard's point of view.

That being said, you have a better chance of seeing garbage when testing with Release / optimized builds. Debug builds tend to do extra stuff to help diagnosing problems, including doing some extra initialization.

(For the record: gcc and clang with -O3 appear to do no unnecessary initialization on my Linux system at first glance. Nevertheless, I got "all zeroes" for every case. That appears to be by chance.)



回答2:

The other answer doesn't really address the REASON just kind of dances around with the language specification.

The actual reason is due to how the initialization process works.

Ask yourself the question how do I know if something is initialized.

That is why static data DOES need to be initialized, while data that is not, does not. If you didn't go through first and zero out all of the static data then the static dynamic initialization process (look it up) would be basically impossible.

You would constantly run into issues like two statics that obliquely reference each other in their initialization and everything falls apart.

So without this rule C++ basically is impossible to write a compiler for. Though there's other initialization schemes that don't have this requirement it would require a big overhaul of the language to implement them.