Ctor Initializer: self initialization causes crash

2020-04-17 04:53发布

问题:

I had a hard time debugging a crash on production. Just wanted to confirm with folks here about the semantics. We have a class like ...

class Test {
public:
  Test()
  {
    // members initialized ...
    m_str = m_str;
  }
  ~Test() {}
private:
  // other members ...
  std::string m_str;
};

Someone changed the initialization to use ctor initialization-lists which is reasonably correct within our code semantics. The order of initialization and their initial value is correct among other things. So the class looks like ...

class Test {
public:
  Test() 
    : /*other inits ,,, */ m_str(m_str)
  {
  }
  ~Test() {}
private:
  // other members ...
  std::string m_str;
};

But the code suddenly started crashing! I isolated the long list of inits to this piece of code m_str(m_str). I confirmed this via link text.

Does it have to crash? What does the standard say about this? (Is it undefined behavior?)

回答1:

The first constructor is equivalent to

  Test()
  : m_str()
  {
    // members initialized ...
    m_str = m_str;
  }

that is, by the time you get to the assignment within the constructor, m_str has already been implicitly initialized to an empty string. So the assignment to self, although completely meaningless and superfluous, causes no problems (since std::string::operator=(), as any well written assignment operator should, checks for self assignment and does nothing in this case).

However, in the second constructor, you are trying to initialize m_str with itself in the initializer list - at which point it is not yet initialized. So the result is undefined behaviour.

Update: For primitive types, this is still undefined behaviour (resulting in a field with garbage value), but it does not crash (usually - see the comments below for exceptions) because primitive types by definition have no constructors, destructors and contain no pointers to other objects.

Same is true for any type that does not contain pointer members with ownership semantics. std::string is hereby demonstrated to be not one of these :-)



回答2:

m_str is constructed in the initialization list. Therefore, at the time you are assigning it to itself, it is not fully constructed. Hence, undefined behavior.

(What is that self-assignment supposed to do anyway?)



回答3:

The original "initialization" by assignment is completely superfluous.

It didn't do any harm, other than wasting processor cycles, because at the time of the assignment the m_str member had already been initialized, by default.

In the second code snippet the default initialization is overridden to use the as-yet-uninitialized member to initialize itself. That's Undefined Behavior. And it's completely unnecessary: just remove that (and don't re-introduce the original time-waster, just, remove).

By turning up the warning level of your compiler you may be able to get warnings about this and similar trivially ungood code.

Unfortunately the problem you're having is not this technical one, it's much more fundamental. It's like a worker in a car factory poses a question about the square wheels they're putting on the new car brand. Then the problem isn't that the square wheels don't work, it's that a whole lot of engineers and managers have been involved in the decision to use the fancy looking square wheels and none of them objected -- some of them undoubtedly didn't understand that square wheels don't work, but most of them, I suspect, were simply afraid to say what that they were 100% sure of. So it's most probably a management problem. I'm sorry, but I don't know a fix for that...



回答4:

Undefined behavior doesn't have to lead to a crash -- it can do just about anything, from continuing to work as if there was no problem at all, to crashing immediately, to doing something really strange that causes seemingly unrelated problems later. The canonical claim is that it makes "demons fly out of your nose" (aka, "causes nasal demons"). At one time the inventor of the phase had a (pretty cool) web site telling about the nuclear war that started from somebody causing undefined behavior in the "DeathStation 9000".

Edit: The exact wording from the standard is (§:1.3.12):

1.3.12 undefined behavior [defns.undefined]

behavior, such as might arise upon use of an erroneous program construct or erroneous data, for which this International Standard imposes no requirements. Undefined behavior may also be expected when this International Standard omits the description of any explicit definition of behavior. [Note: permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).



回答5:

This is the same difference as between

std::string str;
str = str;

and

std::string str(str);

The former works (although it's nonsense), the latter doesn't, since it tries to copy-construct an object from a not-yet-constructed object.

Of course, the way to go would be

Test() : m_str() {}