What is the point behind unions in C?

2019-04-25 22:41发布

问题:

I'm going through O'Reilly's Practical C Programming book, and having read the K&R book on the C programming language, and I am really having trouble grasping the concept behind unions.

They take the size of the largest data type that makes them up...and the most recently assigned one overwrites the rest...but why not just use / free memory as needed?

The book mentions that it's used in communication, where you need to set flags of the same size; and on a googled website, that it can eliminate odd-sized memory chunks...but is it of any use in a modern, non-embedded memory space?

Is there something crafty you can do with it and CPU registers? Is it simply a hold over from an earlier era of programming? Or does it, like the infamous goto, still have some powerful use (possibly in tight memory spaces) that makes it worth keeping around?

回答1:

Well, you almost answered your question: Memory. Back in the days memory was rather low, and even saving a few kbytes has been useful.

But even today there are scenarios where unions would be useful. For example, if you'd like to implement some kind of variant datatype. The best way to do this is using a union.

This doesn't sound like much, but let's just assume you want to use a variable either storing a 4 character string (like an ID) or a 4 byte number (which could be some hash or indeed just a number).

If you use a classic struct, this would be 8 bytes long (at least, if you're unlucky there are filling bytes as well). Using an union it's only 4 bytes. So you're saving 50% memory, which isn't a lot for one instance, but imagine having a million of these.

While you can achieve similar things by casting or subclassing a union is still the easiest way to do this.



回答2:

One use of unions is having two variables occupy the same space, and a second variable in the struct decide what data type you want to read it as.

e.g. you could have a boolean 'isDouble', and a union 'doubleOrLong' which has both a double and a long. If isDouble == true interpret the union as a double else interpret it as a long.

Another use of unions is accessing data types in different representations. For instance, if you know how a double is laid out in memory, you could put a double in a union, access it as a different data type like a long, directly access its bits, its mantissa, its sign, its exponent, whatever, and do some direct manipulation with it.

You don't really need this nowadays since memory is so cheap, but in embedded systems it has its uses.



回答3:

The Windows API makes use of unions quite a lot. LARGE_INTEGER is an example of such a usage. Basically, if the compiler supports 64-bit integers, use the QuadPart member; otherwise, set the low DWORD and the high DWORD manually.



回答4:

It's not really a hold over, as the C language was created in 1972, when memory was a real concern.

You could make the argument that in modern, non-embedded space, you might not want to use C as a programming language to begin with. If you've chosen C as your language choice for implementation, you're looking to harness the benefits of C: it's efficient, close-to-metal, which results in tight, fast binaries.

As such, when choosing to use C, you'd still want to take advantage of it's benefits, which includes memory-space efficiency. To which, the Union works very well; allowing you to have some degree of type safety, while enforcing the smallest memory foot print available.



回答5:

One place where I have seen it used is in the Doom 3/idTech 4 Fast Inverse Square Root implementation.

For those unfamiliar with this algorithm, it essentially requires treating a floating point number as an integer. The old Quake (and earlier) version of the code does this by the following:

float y = 2.0f;

// treat the bits of y as an integer
long i  = * ( long * ) &y;

// do some stuff with i

// treat the bits of i as a float
y = * ( float * ) &i;

original source on GitHub

This code takes the address of a floating point number y, casts it to a pointer to a long (ie, a 32 bit integer in Quake days), and derefences it into i. Then it does some incredibly bizarre bit-twiddling stuff, and the reverse.

There are two disadvantages of doing it this way. One is that the convoluted address-of, cast, dereference process forces the value of y to be read from memory, rather than from a register1, and ditto on the way back. On Quake-era computers, however, floating point and integer registers were completely separate so you pretty much had to push to memory and back to deal with this restriction.

The second is that, at least in C++, doing such casting is deeply frowned upon, even when doing what amounts to voodoo such as this function does. I'm sure there are more compelling arguments, however I'm not sure what they are :)

So, in Doom 3, id included the following bit in their new implementation (which uses a different set of bit twiddling, but a similar idea):

union _flint {
        dword                   i;
        float                   f;
};

...
union _flint seed;
seed.i = /* look up some tables to get this */;
double r = seed.f; // <- access the bits of seed.i as a floating point number

original source on GitHub

Theoretically, on an SSE2 machine, this can be accessed through a single register; I'm not sure in practice whether any compiler would do this. It's still somewhat cleaner code in my opinion than the casting games in the earlier Quake version.


1 - ignoring "sufficiently advanced compiler" arguments



标签: c unions