Is it OK to access past the size of a structure vi

2019-01-14 21:16发布

Specifically, is the following code, the line below the marker, OK?

struct S{
    int a;
};

#include <stdlib.h>

int main(){
    struct S *p;
    p = malloc(sizeof(struct S) + 1000);
    // This line:
    *(&(p->a) + 1) = 0;
}

People have argued here, but no one has given a convincing explanation or reference.

Their arguments are on a slightly different base, yet essentially the same

typedef struct _pack{
    int64_t c;
} pack;

int main(){
    pack *p;
    char str[9] = "aaaaaaaa"; // Input
    size_t len = offsetof(pack, c) + (strlen(str) + 1);
    p = malloc(len);
    // This line, with similar intention:
    strcpy((char*)&(p->c), str);
//                ^^^^^^^

3条回答
Root(大扎)
2楼-- · 2019-01-14 21:44

The intent at least since the standardization of C in 1989 has been that implementations are allowed to check array bounds for array accesses.

The member p->a is an object of type int. C11 6.5.6p7 says that

7 For the purposes of [additive operators] a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

Thus

&(p->a)

is a pointer to an int; but it is also as if it were a pointer to the first element of an array of length 1, with int as the object type.

Now 6.5.6p8 allows one to calculate &(p->a) + 1 which is a pointer to just past the end of the array, so there is no undefined behaviour. However, the dereference of such a pointer is invalid. From Appendix J.2 where it is spelt out, the behaviour is undefined when:

Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just beyond the array object and is used as the operand of a unary * operator that is evaluated (6.5.6).

In the expression above, there is only one array, the one (as if) with exactly 1 element. If &(p->a) + 1 is dereferenced, the array with length 1 is accessed out of bounds and undefined behaviour occurs, i.e.

behavior [...], for which [The C11] Standard imposes no requirements

With the note saying that:

Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

That the most common behaviour is ignoring the situation completely, i.e. behaving as if the pointer referenced the memory location just after, doesn't mean that other kind of behaviour wouldn't be acceptable from the standard's point of view - the standard allows every imaginable and unimaginable outcome.


There has been claims that the C11 standard text has been written vaguely, and the intention of the committee should be that this indeed be allowed, and previously it would have been alright. It is not true. Read the part from the committee response to [Defect Report #017 dated 10 Dec 1992 to C89].

Question 16

[...]

Response

For an array of arrays, the permitted pointer arithmetic in subclause 6.3.6, page 47, lines 12-40 is to be understood by interpreting the use of the word object as denoting the specific object determined directly by the pointer's type and value, not other objects related to that one by contiguity. Therefore, if an expression exceeds these permissions, the behavior is undefined. For example, the following code has undefined behavior:

 int a[4][5];

 a[1][7] = 0; /* undefined */ 

Some conforming implementations may choose to diagnose an array bounds violation, while others may choose to interpret such attempted accesses successfully with the obvious extended semantics.

(bolded emphasis mine)

There is no reason why the same wouldn't be transferred to scalar members of structures, especially when 6.5.6p7 says that a pointer to them should be considered to behave the same as a pointer to the first element of an array of length one with the type of the object as its element type.

If you want to address the consecutive structs, you can always take the pointer to the first member and cast that as the pointer to the struct and advance that instead:

*(int *)((S *)&(p->a) + 1) = 0;
查看更多
趁早两清
3楼-- · 2019-01-14 21:44

This is undefined behavior, as you are accessing something that is not an array (int a within struct S) as an array, and out of bounds at that.

The correct way to achieve what you want, is to use an array without a size as the last struct member:

#include <stdlib.h>

typedef struct S {
    int foo;    //avoid flexible array being the only member
    int a[];
} S;

int main(){
    S *p = malloc(sizeof(*p) + 2*sizeof(int));
    p->a[0] = 0;
    p->a[1] = 42;    //Perfectly legal.
}
查看更多
4楼-- · 2019-01-14 21:45

C standard guarantees that
§6.7.2.1/15:

[...] A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.

&(p->a) is equivalent to (int *)p. &(p->a) + 1 will be address of the element of the second struct. In this case, only one element is there, there will not be any padding in the structure so this will work but where there will be padding this code will break and leads to undefined behaviour.

查看更多
登录 后发表回答