What is the correct way to check equality between

2019-06-21 11:20发布

问题:

I have a multithreaded application that stores data as an array of instances of the following union

union unMember {
    float fData;
    unsigned int uiData;
};

The object that stores this array knows what type the data in the union is and so I dont have problems with UB when retrieving the correct type. However in other parts of the program, I need to test equality between 2 instances of these unions and in this part of the code the true internal data type is not known. The result of this is that I can't test equality of the union using this kind of approach

  unMember un1;
  unMember un2;
  if (un1 == un2) {
     // do stuff
  }

as I get compiler errors. As such I am simply to compare the float part of the union

  if (un1.fData == un2.fData) {
     // compiles but is it valid?
  }

Now given that I have read about it being UB accessing any part of a union that was not the part that was last written to (that is cumbersomely written but I can think of no more articulate way to say this) I am wondering if the code above is a valid way to check equality of my union instances??

This has made me realise that internally I have no idea how unions really work. I had assumed that data was simply stored as a bit pattern and that you could interpret that in whatever way you like depending on the types listed in the union. If this is not the case, what is a safe/correct way to test equality of 2 instances of a union?

Finally, my application is written in C++ but I realise that unions are also part of C, so is there any difference in how they are treated by the 2 languages?

回答1:

In general, you need to prepend some kind of indicator of the current union type:

struct myData
{
    int dataType;
    union {
        ...
    } u;
}

Then:

if (un1.dataType != un2.dataType)
    return (1 == 0);
switch(un1.dataType)
{
    case TYPE_1:
        return (un1.u.type1 == un2.u.type1);
    case TYPE_2:
        ...
}

Anyway, the syntax

if (un1.fData == un2.fData) {
    // compiles but is it valid?
}

which does compile and is valid, might not work for two reasons. One is that, as you said, maybe un2 contains an integer and not a floating point. But in that case the equality test will normally fail anyway. The second is that both structures hold a floating point, and they represent the same number with a slight machine error. Then the test will tell you the numbers are different (bit by bit they are), while their "meaning" is the same.

Floating points are usually compared like

if (dabs(f1 - f2) < error)

to avoid this pitfall.



回答2:

Different types are likely to have different storage lengths (two bytes vs say four bytes).

When a union member is written to, all that is guaranteed is that the member written to is correct.

If then you compare a different member, you have no idea what will be in the extra bytes.

The correct method to test for union equality is to have a struct which contains the union and member which indicates the current member in use, and to switch on that member, where the cases of the switch handle the equality check for each possible union member, e.g. you have to store the in-use information along with the union.

E.g.

enum test_enum
{
  TEST_ENUM_INT,
  TEST_ENUM_FLOAT
};

union test_union
{
  int
    test_int;

  float
    test_float;
};

struct test_struct
{
  enum test_enum
    te;

  union test_union
    tu;
};


回答3:

I think it would be safest if you implemented a class instead. If a construct does not provide a feature (in this case automatically determining the right member to evaluate), then the construct might just not be suitable for your needs and you should use another construct ;) That may be a custom class, or perhaps a VARIANT if you use COM (which is basically a struct as proposed by @lserni).



回答4:

In general what you are asking is impossible. Only the memory from the variable that you set would be guaranteed to be what you expect. The other memory is essentially random. However, in your case you can compare it because the size of everything is the same. If I were doing it I would just compare the unsigned ints or do a memcmp. This all relies on the fact that all members of the union have the same size. If you added a double for example all bets would be off. This falls into the bit twiddling that you can do and get away with in C/C++ but it's much harder to maintain. You are making an assumption about the union and it needs to be clear in the code that you made this assumption. A future maintainer could blow it and cause all kinds of hard to debug issues.

The best thing to do would be to have a struct with a type flag in it or use something like Boost Variant. When using something like this you would be future proofing yourself and using standard code that future maintainers have a chance at knowing or can look up the documentation on.

Another note, you have to define what you mean by equality in the case of floats. If you want a fuzzy comparison then you certainly need to know the type. If you want a bit-wise comparison then that's easy enough.



回答5:

In C++, members that are not the last member written to are considered to be uninitialized (and so reading them is undefined behaviour). In C, they are considered to contain the object representation of the member that was written to, which may or not be a valid object representation.

That is,

union U {
    S x;
    T y;
} u;
u.x = 0;
T t = u.y;    // C++ - reading uninitialized memory - could crash
T t = u.y;    /* C - reading object representation of u.x - could crash */

In practice, C++ reading a union non-assigned member will behave the same as C if the code is sufficiently remote from the code that wrote the assigned member, because the only way for the compiler to generate code that behaves differently is to optimize the read-write combination.

A safe method in both languages (guaranteed not to crash) is to compare the memory contents as an array of char e.g. using memcmp:

union U u1, u2;
u1.x = 0;
u2.x = 0;

memcmp(&u1, &u2, sizeof(union U));

This may not however reflect the actual equality of the union members; e.g. for floating-point types two NaN can values have the same memory representation and compare unequal, while -0.0 and 0.0 (negative and positive zero) have different memory representations but compare equal. There is also the issue of the two types having different sizes, or containing bits that do not participate in the value (padding bits, not an issue on most modern commodity platforms). In addition, struct types can contain padding for alignment.



标签: c++ c unions