I found this code on cppreference.com. It's the strangest C++ I've seen, and I have a few questions about it:
union S
{
std::string str;
std::vector<int> vec;
~S() {}
};
int main()
{
S s = { "Hello, world" };
// at this point, reading from s.vec is undefined behavior
std::cout << "s.str = " << s.str << '\n';
s.str.~basic_string<char>();
new (&s.vec) std::vector<int>;
// now, s.vec is the active member of the union
s.vec.push_back(10);
std::cout << s.vec.size() << '\n';
s.vec.~vector<int>();
}
I want to make sure I've got a few things right.
- The union forces you to initialise one of the union members by deleting the default constructors, in this case he initialised the string with Hello World.
- After he's initialised the string, the vector technically doesn't exist yet? I can access it, but it isn't constructed yet?
- He explicitly destroys the string object by calling its destructor. In this case when S goes out of scope, will the ~S() destructor be called? If so, on which object? If he doesn't call the destructor explicitly on the string is it a memory leak? I'm leaning towards no because strings clean themselves up, but for unions I don't know. He calls the destructor for both the string and vector himself, so the ~S() destructor seems useless, but when I delete it my compiler won't let me compile it.
- This is the first time I've seen someone use the new operator to place an object on the stack. In this case is this the only way now that the vector can be used?
- When you use placement new as he does with the vector, you're not supposed to call delete on it because new memory hasn't been allocated. Usually if you placement new on the heap you have to free() the memory to avoid a leak, but in this case what happens if he let's the vector and union go out of scope without calling the destructor?
I find this really confusing.
- Yes, exactly.
- Because the vector and the string use the same underlying storage (which is how
union
s work), and that storage currently contains a string, there is no place for a vertor to be and trying to access it would be undefined. It’s not that it hasn’t been constructed yet; it’s that it cannot be constructed because there’s a string in the way.
- Whenever an
S
goes out of scope, its destructor is called. In this case, that’s the union’s destructor, which was explicitly defined to do nothing (because the union can’t know which member is active, so it can’t actually do what it’s supposed to). Because the union cannot know which of its members is active, if you don’t explicitly call the destructor of the string, it cannot know there was a string there and the string will not be cleaned up. The compiler makes you write your own destructor when there are union members with non-trivial destructors, because it can’t know how to clean that up and hopes that you do; in this example you don’t know how to clean it up either, so you do nothing in the union’s destructor and make the person who uses S
call the destructor on the correct element manually.
- This is called “placement new”, and is the typical way to construct an object in an existing memory location instead of allocating a new one. There are uses for it besides unions, but I believe that it’s the only way to get a vector into this union without using undefined behavior.
- As addressed in part 3), when
s
goes out of scope, it doesn’t know if it holds a string or a vector. The ~S
destructor does nothing, so you need to destroy the vector with its own destructor, like with the string.
To see why the union can’t automatically know which destructor to call, consider this alternate function:
int maybe_string() {
S s = {"Hello, world"};
bool b;
std::cin >> b;
if (b) {
s.str.~basic_string<char>();
new (&s.vec) std::vector<int>;
}
b = false;
// Now there is no more information in the program for what destructor to call.
}
At the end of the function, the compiler has no way to know if s
contains a string or a vector. If you don’t call a destructor manually (assuming you had a way to tell, which I don’t think you do here), it will have to play it safe and not destroy either member. Instead of having complicated rules about when the compiler would be able to destroy the active member and when it wouldn’t destroy anything, the creators of C++ decided to keep things simple and just never destroy the active member of a union automatically and instead force the programmer to do it manually.
The union forces you to initialise one of the union members by deleting the default constructors, in this case he initialised the string with Hello World.
Correct
After he's initialised the string, the vector technically doesn't exist yet? I can access it, but it isn't constructed yet?
Well, even though it is accessible doesn't mean you can access. Since it is not the active item accessing it is undefined behavior. The reason for this is its lifetime has not begun because its constructor has not yet been called.
will the ~S() destructor be called?
No, s
will only be destroyed when it goes out of scope.
If he doesn't call the destructor explicitly on the string is it a memory leak?
Yes, but what it really is though is undefined behavior. You can't change members without destroying the active one since the destructor is not trivial. If you don't destroy the string before you create the vector then you lose the state of the string which includes the memory it was holding (if it held any - see small string optimizations on how it could not).
so the ~S() destructor seems useless, but when I delete it my compiler won't let me compile it.
It is useless as you say but it really all you can do. The union has to have a destructor and the compiler provided one is deleted because std::string
and std::vector
have non trivial destructors.
In this case is this the only way now that the vector can be used?
Yes. You have to use placement new in order for the object to be constructed. If you didn't and tried to do something like
s.vec = std::vector<int>{};
Then you would be assigning to an object that was never constructed which is undefined behavior.
vector and union go out of scope without calling the destructor?
Well, if they didn't manually destroy the vector then you would leak what the vector holds as nothing would be destroyed. As long as you destroy the active member before the union goes out of scope then you are fine.