I have used unions earlier comfortably; today I was alarmed when I read this post and came to know that this code
union ARGB
{
uint32_t colour;
struct componentsTag
{
uint8_t b;
uint8_t g;
uint8_t r;
uint8_t a;
} components;
} pixel;
pixel.colour = 0xff040201; // ARGB::colour is the active member from now on
// somewhere down the line, without any edit to pixel
if(pixel.components.a) // accessing the non-active member ARGB::components
is actually undefined behaviour I.e. reading from a member of the union other than the one recently written to leads to undefined behaviour. If this isn't the intended usage of unions, what is? Can some one please explain it elaborately?
Update:
I wanted to clarify a few things in hindsight.
- The answer to the question isn't the same for C and C++; my ignorant younger self tagged it as both C and C++.
- After scouring through C++11's standard I couldn't conclusively say that it calls out accessing/inspecting a non-active union member is undefined/unspecified/implementation-defined. All I could find was §9.5/1:
If a standard-layout union contains several standard-layout structs that share a common initial sequence, and if an object of this standard-layout union type contains one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of standard-layout struct members. §9.2/19: Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members.
- While in C, (C99 TC3 - DR 283 onwards) it's legal to do so (thanks to Pascal Cuoq for bringing this up). However, attempting to do it can still lead to undefined behavior, if the value read happens to be invalid (so called "trap representation") for the type it is read through. Otherwise, the value read is implementation defined.
C89/90 called this out under unspecified behavior (Annex J) and K&R's book says it's implementation defined. Quote from K&R:
This is the purpose of a union - a single variable that can legitimately hold any of one of several types. [...] so long as the usage is consistent: the type retrieved must be the type most recently stored. It is the programmer's responsibility to keep track of which type is currently stored in a union; the results are implementation-dependent if something is stored as one type and extracted as another.
Extract from Stroustrup's TC++PL (emphasis mine)
Use of unions can be essential for compatness of data [...] sometimes misused for "type conversion".
Above all, this question (whose title remains unchanged since my ask) was posed with an intention of understanding the purpose of unions AND not on what the standard allows E.g. Using inheritance for code reuse is, of course, allowed by the C++ standard, but it wasn't the purpose or the original intention of introducing inheritance as a C++ language feature. This is the reason Andrey's answer continues to remain as the accepted one.
In the C language as it was documented in 1974, all structure members shared a common namespace, and the meaning of "ptr->member" was defined as adding the member's displacement to "ptr" and accessing the resulting address using the member's type. This design made it possible to use the same ptr with member names taken from different structure definitions but with the same offset; programmers used that ability for a variety of purposes.
When structure members were assigned their own namespaces, it became impossible to declare two structure members with the same displacement. Adding unions to the language made it possible to achieve the same semantics that had been available in earlier versions of the language (though the inability to have names exported to an enclosing context may have still necessitated using a find/replace to replace foo->member into foo->type1.member). What was important was not so much that the people who added unions have any particular target usage in mind, but rather that they provide a means by which programmers who had relied upon the earlier semantics, for whatever purpose, should still be able to achieve the same semantics even if they had to use a different syntax to do it.
The purpose of unions is rather obvious, but for some reason people miss it quite often.
The purpose of union is to save memory by using the same memory region for storing different objects at different times. That's it.
It is like a room in a hotel. Different people live in it for non-overlapping periods of time. These people never meet, and generally don't know anything about each other. By properly managing the time-sharing of the rooms (i.e. by making sure different people don't get assigned to one room at the same time), a relatively small hotel can provide accomodations to a relatively large number of people, which is what hotels are for.
That's exactly what union does. If you know that several objects in your program hold values with non-overlapping value-lifetimes, then you can "merge" these objects into a union and thus save memory. Just like a hotel room has at most one "active" tenant at each moment of time, a union has at most one "active" member at each moment of program time. Only the "active" member can be read. By writing into other member you switch the "active" status to that other member.
For some reason, this original purpose of the union got "overriden" with something completely different: writing one member of a union and then inspecting it through another member. This kind of memory reinterpretation (aka "type punning") is
not a valid use of unions. It generally leads to undefined behavioris decribed as producing implemenation-defined behavior in C89/90.EDIT: Using unions for the purposes of type punning (i.e. writing one member and then reading another) was given a more detailed definition in one of the Technical Corrigendums to C99 standard (see DR#257 and DR#283). However, keep in mind that formally this does not protect you from running into undefined behavior by attempting to read a trap representation.
For one more example of the actual use of unions, the CORBA framework serializes objects using the tagged union approach. All user-defined classes are members of one (huge) union, and an integer identifier tells the demarshaller how to interpret the union.
In C++, Boost Variant implement a safe version of the union, designed to prevent undefined behavior as much as possible.
Its performances are identical to the
enum + union
construct (stack allocated too etc) but it uses a template list of types instead of theenum
:)Although this is strictly undefined behaviour, in practice it will work with pretty much any compiler. It is such a widely used paradigm that any self-respecting compiler will need to do "the right thing" in cases such as this. It's certainly to be preferred over type-punning, which may well generate broken code with some compilers.
As you say, this is strictly undefined behaviour, though it will "work" on many platforms. The real reason for using unions is to create variant records.
Of course, you also need some sort of discriminator to say what the variant actually contains. And note that in C++ unions are not much use because they can only contain POD types - effectively those without constructors and destructors.