In gcc-strict-aliasing-and-casting-through-a-union I asked whether anyone had encountered problems with union punning through pointers. So far, the answer seems to be No.
This question is broader: Do you have any horror stories about gcc and strict-aliasing?
Background: Quoting from AndreyT's answer in c99-strict-aliasing-rules-in-c-gcc:
"Strict aliasing rules are rooted in parts of the standard that were present in C and C++ since the beginning of [standardized] times. The clause that prohibits accessing object of one type through a lvalue of another type is present in C89/90 (6.3) as well as in C++98 (3.10/15). ... It is just that not all compilers wanted (or dared) to enforce it or rely on it."
Well, gcc is now daring to do so, with its -fstrict-aliasing
switch. And this has caused some problems. See, for example, the excellent article http://davmac.wordpress.com/2009/10/ about a Mysql bug, and the equally excellent discussion in http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html.
Some other less-relevant links:
- performance-impact-of-fno-strict-aliasing
- strict-aliasing
- when-is-char-safe-for-strict-pointer-aliasing
- how-to-detect-strict-aliasing-at-compile-time
So to repeat, do you have a horror story of your own? Problems not indicated by -Wstrict-aliasing
would, of course, be preferred. And other C compilers are also welcome.
Added June 2nd: The first link in Michael Burr's answer, which does indeed qualify as a horror story, is perhaps a bit dated (from 2003). I did a quick test, but the problem has apparently gone away.
Source:
#include <string.h>
struct iw_event { /* dummy! */
int len;
};
char *iwe_stream_add_event(
char *stream, /* Stream of events */
char *ends, /* End of stream */
struct iw_event *iwe, /* Payload */
int event_len) /* Real size of payload */
{
/* Check if it's possible */
if ((stream + event_len) < ends) {
iwe->len = event_len;
memcpy(stream, (char *) iwe, event_len);
stream += event_len;
}
return stream;
}
The specific complaint is:
Some users have complained that when the [above] code is compiled without the -fno-strict-aliasing, the order of the write and memcpy is inverted (which means a bogus len is mem-copied into the stream).
Compiled code, using gcc 4.3.4 on CYGWIN wih -O3 (please correct me if I am wrong--my assembler is a bit rusty!):
_iwe_stream_add_event:
pushl %ebp
movl %esp, %ebp
pushl %ebx
subl $20, %esp
movl 8(%ebp), %eax # stream --> %eax
movl 20(%ebp), %edx # event_len --> %edx
leal (%eax,%edx), %ebx # sum --> %ebx
cmpl 12(%ebp), %ebx # compare sum with ends
jae L2
movl 16(%ebp), %ecx # iwe --> %ecx
movl %edx, (%ecx) # event_len --> iwe->len (!!)
movl %edx, 8(%esp) # event_len --> stack
movl %ecx, 4(%esp) # iwe --> stack
movl %eax, (%esp) # stream --> stack
call _memcpy
movl %ebx, %eax # sum --> retval
L2:
addl $20, %esp
popl %ebx
leave
ret
And for the second link in Michael's answer,
*(unsigned short *)&a = 4;
gcc will usually (always?) give a warning. But I believe a valid solution to this (for gcc) is to use:
#define CAST(type, x) (((union {typeof(x) src; type dst;}*)&(x))->dst)
// ...
CAST(unsigned short, a) = 4;
I've asked SO whether this is OK in gcc-strict-aliasing-and-casting-through-a-union, but so far nobody disagrees.
The Common Initial Sequence rule of C used to be interpreted as making it possible to write a function which could work on the leading portion of a wide variety of structure types, provided they start with elements of matching types. Under C99, the rule was changed so that it only applied if the structure types involved were members of the same union whose complete declaration was visible at the point of use.
The authors of gcc insist that the language in question is only applicable if the accesses are performed through the union type, notwithstanding the facts that:
There would be no reason to specify that the complete declaration must be visible if accesses had to be performed through the union type.
Although the CIS rule was described in terms of unions, its primary usefulness lay in what it implied about the way in which structs were laid out and accessed. If S1 and S2 were structures that shared a CIS, there would be no way that a function that accepted a pointer to an S1 and an S2 from an outside source could comply with C89's CIS rules without allowing the same behavior to be useful with pointers to structures that weren't actually inside a union object; specifying CIS support for structures would thus have been redundant given that it was already specified for unions.
here is mine:
http://forum.openscad.org/CGAL-3-6-1-causing-errors-but-CGAL-3-6-0-OK-tt2050.html
it caused certain shapes in a CAD program to be drawn incorrectly. thank goodness for the project's leaders work on creating a regression test suite.
the bug only manifested itself on certain platforms, with older versions of GCC and older versions of certain libraries. and then only with -O2 turned on. -fno-strict-aliasing solved it.
No horror story of my own, but here are some quotes from Linus Torvalds (sorry if these are already in one of the linked references in the question):
http://lkml.org/lkml/2003/2/26/158:
(Note with hindsight: this code is fine, but Linux's implementation of
memcpy
was a macro that cast tolong *
to copy in larger chunks. With a correctly-definedmemcpy
,gcc -fstrict-aliasing
isn't allowed to break this code. But it means you need inline asm to define a kernelmemcpy
if your compiler doesn't know how turn a byte-copy loop into efficient asm, which was the case for gcc before gcc7)http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg01647.html:
gcc, aliasing, and 2-D variable-length arrays: The following sample code copies a 2x2 matrix:
With gcc 4.1.2 on CentOS, I get:
I don't know whether this is generally known, and I don't know whether this a bug or a feature. I can't duplicate the problem with gcc 4.3.4 on Cygwin, so it may have been fixed. Some work-arounds:
__attribute__((noinline))
for copy().-fno-strict-aliasing
.b[][n]
tob[][2]
.-O2
or-O3
.Further notes:
copy()
like this. (And, as an aside, I was slightly surprised to see gcc did not unroll the double-loop.)-Wstrict-aliasing=
, did anything here.Update: The above does not really answer the OP's question, since he (i.e. I) was asking about cases where strict aliasing 'legitimately' broke your code, whereas the above just seems to be a garden-variety compiler bug.
I reported it to GCC Bugzilla, but they weren't interested in the old 4.1.2, even though (I believe) it is the key to the $1-billion RHEL5. It doesn't occur in 4.2.4 up.
And I have a slightly simpler example of a similar bug, with only one matrix. The code:
produces the results:
It seems it is the combination
-fstrict-aliasing
with-finline
which causes the bug.The following code returns 10, under gcc 4.4.4. Is anything wrong with the union method or gcc 4.4.4?
SWIG generates code that depends on strict aliasing being off, which can cause all sorts of problems.