Can I assign a value to one union member and read

2019-08-13 18:50发布

问题:

Basically, I have a

struct foo {
        /* variable denoting active member of union */
        enum whichmember w;
        union {
                struct some_struct my_struct;
                struct some_struct2 my_struct2;
                struct some_struct3 my_struct3;
                /* let's say that my_struct is the largest member */
        };
};

main()
{
        /*...*/
        /* earlier in main, we get some struct foo d with an */
        /* unknown union assignment; d.w is correct, however */
        struct foo f;
        f.my_struct = d.my_struct; /* mystruct isn't necessarily the */
                                /* active member, but is the biggest */
        f.w = d.w;
        /* code that determines which member is active through f.w */
        /* ... */
        /* we then access the *correct* member that we just found */
        /* say, f.my_struct3 */

        f.my_struct3.some_member_not_in_mystruct = /* something */;
}

Accessing C union members via pointers seems to say that accessing the members via pointers is okay. See comments.

But my question concerns directly accessing them. Basically, if I write all the information that I need to the largest member of the union and keep track of types manually, will accessing the manually specified member still yield the correct information every time?

回答1:

I note that the code in the question uses an anonymous union, which means that it must be written for C11; anonymous unions were not a part of C90 or C99.

ISO/IEC 9899:2011, the current C11 standard, has this to say:

§6.5.2.3 Structure and union members

¶3 A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,95) and is an lvalue if the first expression is an lvalue. If the first expression has qualified type, the result has the so-qualified version of the type of the designated member.

¶4 A postfix expression followed by the -> operator and an identifier designates a member of a structure or union object. The value is that of the named member of the object to which the first expression points, and is an lvalue.96) If the first expression is a pointer to a qualified type, the result has the so-qualified version of the type of the designated member.

¶5 …

¶6 One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.


95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

96) If &E is a valid pointer expression (where & is the ‘‘address-of’’ operator, which generates a pointer to its operand), the expression (&E)->MOS is the same as E.MOS.

Italics as in the standard

And section §6.2.6 Representations of types says (in part):

§6.2.6.1 General

¶6 When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.51) The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.

¶7 When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.


51) Thus, for example, structure assignment need not copy any padding bits.


My interpretation of what you're doing is that footnote 51 says "it might not work" because you may have assigned only part of the structure. You're treading on thin ice, at best. However, against that, you stipulate that the assigned structure (in the f.my_struct = d.my_struct; assignment) is the largest member. The chances are moderately high that it won't go wrong, but if the padding bytes in the two structures (in the active member of the union and in the largest member of the union) are at different places, then things could go wrong and if you reported a problem to the compiler writer, the compiler writer would simply say to you "don't contravene the standard".

So, to the extent I'm a language lawyer, this language lawyer's answer is "It is not guaranteed". In practice, you're unlikely to run into problems, but the possibility is there and you have no comeback on anyone.

To make your code safe, simply use f = d; with a union assignment.


Illustrative Example

Suppose that the machine requires double aligned on an 8-byte boundary and sizeof(double) == 8, that int must be aligned on a 4-byte boundary and sizeof(int) == 4, and that short must be aligned on a 2-byte boundary and sizeof(short) == 2). This is a plausible and even common set of sizes and alignment requirements.

Further, suppose that you have a two-structure union variant of the structure in the question:

struct Type_A { char x; double y; };
struct Type_B { int a; short b; short c; };
enum whichmember { TYPE_A, TYPE_B };

struct foo
{
    enum whichmember w;
    union
    {
        struct Type_A s1;
        struct Type_B s2;
    };
};

Now, under the sizes and alignments specified, the struct Type_A will occupy 16 bytes, and struct Type_B will occupy 8 bytes, so the union will use 16 bytes too. The layout of the union will be like this:

+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| x | p...a...d...d...i...n...g |               y               |  s1
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|       a       |   b   |   c   |   p...a...d...d...i...n...g   |  s2
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

The w element would also mean that there are 8 bytes in struct foo before the (anonymous) union, of which it is likely that w only occupies 4. The size of struct foo is therefore 24 on this machine. That's not particularly relevant to the discussion, though.

Now suppose we have code like this:

struct foo d;
d.w = TYPE_B;
d.s2.a = 1234;
d.s2.b = 56;
d.s2.c = 78;

struct foo f;
f.s1 = d.s1;
f.w  = TYPE_B;

Now, under the ruling of footnote 51, the structure assignment f.s1 = d.s1; does not have to copy the padding bits. I know of no compiler that behaves like this, but the standard says that a compiler need not copy the padding bits. That means that the value of f.s1 could be:

+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| x | g...a...r...b...a...g...e |   r...u...b...b...i...s...h   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

The garbage is because those 7 bytes need not have been copied (footnote 51 says that is an option, even though it is not likely to be an option exercised by any current compiler). The rubbish is because the initialization of d never set any values in those bytes; the contents of that part of the structure is unspecified.

If you now go ahead and try to treat f as a copy of d, you might be a little surprised to find that only 1 byte of the 8 relevant bytes of f.s2 is actually initialized.

I'll reemphasize: I know of no compiler that would do this. But the question is tagged 'language lawyer' so the issue is 'what does the language standard state' and this is my interpretation of the quoted sections of the standard.



回答2:

Yes your code will work because with an union the compiler will share the same memory space for all the elements.

For example if: &f.mystruct = 100 then &f.mystruct2 = 100 and &f.mystruct3 = 100

If mystruct is the largest one then it will work all the time.



回答3:

Yes you can directly access them. You can assign a value to a union member and read it back through a different union member. The result will be deterministic and correct.