Undefined behaviour of operators in XOR swap algor

2020-02-12 06:52发布

问题:

void swap(int* a, int* b) {
    if (a != b)
        *a ^= *b ^= *a ^= *b;
}

As the above *a ^= *b ^= *a ^= *b is just a shortcut for *a = *a ^ (*b = *b ^ (*a = *a ^ *b)), could (e.g.) the 2nd *a be evaluated (for the XOR) just before the 3rd *a is modified (by the =)?

Does it matter whether I write it in C99/C11/C++98/C++11?

回答1:

The C++11 standard says:

5.17/1: The assignment operator (=) and the compound assignment operators all group right-to-left. (...) the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.

1.9/15: If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

So *a ^= *b is sequenced as follows :

  1. *a and *b are calculated. It's NOT determined in which order
  2. the xor operation is performed
  3. the assignment is done, i.e. the new value is stored in *a
  4. the new value is used as result of the expression (*a ^= *b)

Now *b ^= *a ^= *b, which according to priority rule is *b ^= (*a ^= *b) :

  1. *b and (*a ^= *b) are calculated. It's NOT determined in which order. But as *b is not modified by (*a ^= *b) it doesn't matter.
  2. the xor operation is performed
  3. the assignment is done, i.e. the new value is stored in *b

But now to the unspecified sequencing with *a ^= *b ^= *a ^= *b which is according to priority rules *a ^= (*b ^= (*a ^= *b) ):

  1. *a and (*b ^= (*a ^= *b) ) are calculated. It's NOT determined in which order. But as *a IS modified by (*b ^= (*a ^= *b) ). So the result will depend on which value is calculated first. That's clearly an U.B.

Suppose *a is evaluated first, (i.e. before anything else):
you would get the original value of it, which will be xored with the value of (*b ^= (*a ^= *b) ), that is the original *b xored with original *a xored again with *b. This will result in 0 (which will be stored in *a).

Suppose (*b ^= (*a ^= *b) ) is evaluated first, then its result is the original *a, but the content of *a is changed to the original *a xored with the original *b. Thus this will result in the original *b (which will be stored in *a)

By the way, in both cases, *b contains the original value of *a xored twice with *b meaning that *b will contain the original *a.

CONCLUSION: Here is demonstrated that the final value of *b is uniquely determined by this expression, but that the final value of *a is not uniquely defined (two values possible). So it's clearly an UNSPECIFIED/UNDEFINED RESULT ! It may swap or it might lose *a depending on your compiler.

How to make the swapping for sure ?

I've demonstrated above that first two compound assignments are well specified. So we just have to make sure that the last compound assignment is done after it. This can be guaranteed by a comma operator:

5.18/1: A pair of expressions separated by a comma is evaluated left-to-right and the value of the left expression is discarded

Hence, the following would work:

void safe_swap(int* a, int* b) {
    if (a != b)
        *b ^= *a ^= *b, *a ^= *b;
}

EDIT: But why an XOR swapping ?

On some embedded device with no more free memory, one might be obliged in extreme conditions to use such advanced trick. But it has drawbacks.

First it is difficult to understand, and as seen above, error prone. Then it might not be as efficient as it seems. Some implementation dependent experiments show less optimal code: 3 MOV and 3 XOR, compared to only 4 MOV for the classical swap using a temporary variable. Some informal benchmarks suggest that it could be slower by 3 to 8% most of the time.

By the way, the classical swap can also be written in one statement:

void modern_swap(int*a, int*b) {
    if (a!=b) 
        tie(*a,*b)=make_pair(*b,*a);
}