可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Strict aliasing prevents us from accessing the same memory location using an incompatible type.
int* i = malloc( sizeof( int ) ) ; //assuming sizeof( int ) >= sizeof( float )
*i = 123 ;
float* f = ( float* )i ;
*f = 3.14f ;
this would be illegal according to C standard, because the compiler "knows" that int
cannot accessed by a float
lvalue.
What if I use that pointer to point to correct memory, like this:
int* i = malloc( sizeof( int ) + sizeof( float ) + MAX_PAD ) ;
*i = 456 ;
First I allocate memory for int
, float
and the last part is memory which will allow float
to be stored on aligned address. float
requires to be aligned on multiples of 4. MAX_PAD
is usually 8 of 16 bytes depending on the system. In any case, MAX_PAD
is large enough so float
can be aligned properly.
Then I write an int
into i
, so far so good.
float* f = ( float* )( ( char* )i + sizeof( int ) + PaddingBytesFloat( (char*)i ) ) ;
*f= 2.71f ;
I use the pointer i
, increment it with the size of int
and align it correctly with the function PaddingBytesFloat()
, which returns the number of bytes required to align a float
, given an address. Then I write a float into it.
In this case, f
points to a different memory location that doesn't overlap; it has a different type.
Here are some parts from the standard (ISO/IEC 9899:201x) 6.5 , I was relying on when writing this example.
Aliasing is when more than one lvalue points to the same memory location. Standard requires that those lvalues have a compatible type with the effective type of the object.
What is effective type, quote from standard:
The effective type of an object for an access to its stored value is the declared type of the
object, if any.87)If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the lvalue becomes the
effective type of the object for that access and for subsequent accesses that do not modify
the stored value. If a value is copied into an object having no declared type using
memcpy or memmove, or is copied as an array of character type, then the effective type
of the modified object for that access and for subsequent accesses that do not modify the
value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is
simply the type of the lvalue used for the access.
87) Allocated objects have no declared type.
I'm trying to connect the pieces and figure out if this is allowed. In my interpretation the effective type of an allocated object can be changed depending on the type of the lvalue used on that memory, because of this part: For
all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.
Is this legal? If not, what if I used a void pointer as lvalue instead of an int pointer i
in my second example? If even that wouldn't work, what if I got the address, which is assigned to the float pointer in the second example, as a memcopied value, and that address was never used as an lvalue before.
回答1:
I think that yes, it is legal.
To illustrate my point, let's see this code:
struct S
{
int i;
float f;
};
char *p = malloc(sizeof(struct S));
int *i = p + offsetof(struct S, i); //this offset is 0 by definition
*i = 456;
float *f = p + offsetof(struct S, f);
*f= 2.71f;
This code is, IMO, clearly legal, and it is equivalent to yours from a compiler point of view, for appropriate values of PaddingBytesFloat()
and MAX_PAD
.
Note that my code does not use any l-value of type struct S
, it is only used to ease the calculation of the paddings.
As I read the standard, in malloc'ed memory has no declared type until something is written there. Then the declared type is whatever is written. Thus the declared type of such memory can be changed any time, overwriting the memory with a value of different type, much like an union.
TL; DR: My conclusion is that with dynamic memory you are safe, with regard to strict-aliasing as long as you read the memory using the same type (or a compatible one) you use to last write to that memory.
回答2:
Yes, this is legal. To see why, you don't even need to think about strict aliasing rule, because it doesn't apply in this case.
According to the C99 standard, when you do this:
int* i = malloc( sizeof( int ) + sizeof( float ) + MAX_PAD ) ;
*i = 456 ;
malloc
will return a pointer to a memory block large enough to hold an object of size sizeof(int)+sizeof(float)+MAX_PAD
. However, notice that you are only using a small piece of this size; in particular, you are only using the first sizeof(int)
bytes. Consequently, you are leaving some free space that can be used to store other objects, as long as you store them into a disjoint offset (that is, after the first sizeof(int)
bytes). This is tightly related with the definition of what exactly is an object. From C99 section 3.14:
Object: region of data storage in the execution environment, the
contents of which can represent values
The precise meaning of the contents of the object pointed to by i
is the value 456
; this implies that the integer object itself only takes a small portion of the memory block you allocated. There is nothing in the standard stopping you from storing a new, different object of any type a few bytes ahead.
This code:
float* f = ( float* )( ( char* )i + sizeof( int ) + PaddingBytesFloat( (char*)i ) ) ;
*f= 2.71f ;
Is effectively attaching another object to a sub-block of the allocated memory. As long as the resulting memory location for f
doesn't overlap with that of i
, and there is enough room left to store a float
, you will always be safe. The strict aliasing rule doesn't even apply here, because the pointers point to objects that do not overlap - the memory locations are different.
I think the key point here is to understand that you are effectively manipulating two distinct objects, with two distinct pointers. It just so happens that both pointers point to the same malloc()
'd block, but they are far enough from one another, so this is not a problem.
You can have a look at this related question: What alignment issues limit the use of a block of memory created by malloc? and read Eric Postpischil's great answer: https://stackoverflow.com/a/21141161/2793118 - after all, if you can store arrays of different types in the same malloc()
block, why wouldn't you store an int
and a float
? You can even look at your code as the special case in which these arrays are one-element arrays.
As long as you take care of alignment issues, the code is perfectly fine and 100% portable.
UPDATE (follow-up, read comments below):
I believe your reasoning about the standard not enforcing strict aliasing on malloc()
'd objects is wrong. It is true that the effective type of a dynamically allocated object can be changed, as conveyed by the standard (it is a matter of using an lvalue expression with a different type to store a new value in there), but note that once you do that, it is your job to ensure that no other lvalue expression with a different type will access the object value. This is enforced by rule 7 on section 6.5, and you quoted it in your question:
An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
- a type compatible with the effective type of the object;
Thus, by the time you change the effective type of an object, you are implicitly promising to the compiler that you won't access this object using an old pointer with an incompatible type (compared to the new effective type). This should be enough for the purposes of the strict aliasing rule.
回答3:
I found a nice analogy. You may also find it useful. Quoting from ISO/IEC 9899:TC2 Committee Draft — May 6, 2005 WG14/N1124
6.7.2.1 Structure and union specifiers
[16] As a special case, the last element of a structure with more than one
named member may have an incomplete array type; this is called a
flexible array member. In most situations, the flexible array member
is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more
trailing padding than the omission would imply. However, when a . (or
->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member,
it behaves as if that member were replaced with the longest array
(with the same element type) that would not make the structure larger
than the object being accessed; the offset of the array shall remain
that of the flexible array member, even if this would differ from that
of the replacement array. If this array would have no elements, it
behaves as if it had one element but the behavior is undefined if any
attempt is made to access that element or to generate a pointer one
past it.
[17] EXAMPLE After the declaration:
struct s { int n; double d[]; };
the structure struct s has a flexible array member d. A typical
way to use this is:
int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));
and assuming that the
call to malloc succeeds, the object pointed to by p behaves, for most
purposes, as if p had been declared as:
struct { int n; double d[m]; } > *p;
(there are circumstances in which this equivalence is broken; in particular, the offsets of member d might not be the same).
It would be more fair to use an example like:
struct ss {
double da;
int ia[];
}; // sizeof(double) >= sizeof(int)
In example of above quote, size of struct s
is same as int
(+ padding) and it is then followed by double. (or some other type, float
in your case)
Accessing memory sizeof(int) + PADDING
bytes after the struct start as double
(using syntactic sugar) looks fine as per this example, so I believe your example is legal C.
回答4:
The strict aliasing rules are there to allow for more aggressive compiler optimizations, specifically by being able to reorder accesses to different types without having to worry about whether they point to the same location. So for instance, in your first example it is perfectly legal for a compiler to reorder the writes to i
and f
, and thus your code is an example of undefined behaviour (UB).
There is an exception to this rule and you have the relevant quote from the standards
having a type that is not a character type
Your second bit of code is entirely safe. The memory regions do no overlap so it does not matter if memory accesses are reordered across that boundary. Indeed the behaviour of the two pieces of code is completely different. The first one places an int in a memory region and then a float in to the same memory region, whereas the second one places an int in to a memory region and a float in to a bit of memory next to it. Even if these accesses are reordered then your code will have the same effect. Perfectly, legal.
I get the feeling I have thus missed the real question here.
The safest way to fiddle with low level memory if you really did want the behaviour in your first program is either (a) a union or (b) a char *
. Using char *
and then casting to the proper type is used in a lot of C code, e.g: in this pcap tutorial (scroll down to "for all those new C programmers who insist that pointers are useless, I smite you."