I have a question about how C / C++ internally stores multidimensional arrays declared using the notation foo[m][n]
. I am not questioning pure pointers to pointers etc... I am asking because of speed reasons...
Correct me if I am wrong, but syntactically foo
is an array of pointers, which themselves point to an array
int foo[5][4]
*(foo + i) // returns a memory address
*( *(foo + i) + j) // returns an int
I have heard from many places that the C/C++ compiler converts foo[m][n]
to a one dimensional array behind the scenes (calculating the required one dimension index with i * width + j
). However if this was true then the following would hold
*(foo + 1) // should return element foo[0][1]
Thus my question:
Is it true that foo[m][n]
is (always?) stored in memory as a flat one dimensional array?? If so, why does the above code work as shown.
C arrays - even multi-dimensional ones - are contiguous, ie an array of type
int [4][5]
is structurally equivalent to an array of typeint [20]
.However, these types are still incompatible according to C language semantics. In particular, the following code is in violation of the C standard:
The reason for this is that the C standard is (probably intentionally) worded in a way which makes bounds-checking implementations possible: As
p
is derived fromfoo[0]
, which has typeint [5]
, valid indices must be in range0..5
(resp.0..4
if you actually access the element).Many other programming languages (Java, Perl, Python, JavaScript, ...) use jagged arrays to implement multi-dimensional arrays. This is also possible in C by using an array of pointers:
However, jagged arrays are not contiguous, and the pointed-to arrays need not be of uniform size.
Because of implicit conversion of array expressions into pointer expressions, indexing jagged and non-jagged arrays looks identical, but the actual address calculations will be quite different:
Yes, C/C++ stores a multi-dimensional (rectangular) array as a contiguous memory area. But, your syntax is incorrect. To modify element
foo[0][1]
, the following code will work:The explicit cast is necessary, because
foo+1
, is the same as&foo[1]
which is not at all the same thing asfoo[0][1]
.*(foo+1)
is a pointer to the fifth element in the flat memory area. In other words,*(foo+1)
is basicallyfoo[1]
and**(foo+1)
isfoo[1][0]
. Here is how the memory is laid out for some of your two dimensional array:A two-dimensional array:
is nothing more or less than an array of arrays:
There are no pointer objects here, either explicit or implicit.
Arrays are not pointers. Pointers are not arrays.
What often causes confusion is that an array expression is, in most contexts, implicitly converted to a pointer to its first element. (And a separate rule says that what looks like an array parameter declaration is really a pointer declaration, but that doesn't apply in this example.) An array object is an array object; declaring such an object does not create any pointer objects. Referring to an array object can create a pointer value (the address of the array's first element), but there is no pointer object stored in memory.
The array object
foo
is stored in memory as 5 contiguous elements, where each element is itself an array of 4 contiguousint
elements; the whole thing is therefore stored as 20 contiguousint
objects.The indexing operator is defined in terms of pointer arithmetic;
x[y]
is equivalent to*(x + y)
. Typically the left operand is going to be either a pointer expression or an array expression; if it's an array expression, the array is implicitly converted to a pointer.So
foo[x][y]
is equivalent to*(foo[x] + y)
, which in turn is equivalent to*(*(foo + x) + y)
. (Note that no casts are necessary.) Fortunately, you don't have to write it that way, andfoo[x][y]
is a lot easier to understand.Note that you can create a data structure that can be accessed with the same
foo[x][y]
syntax, but wherefoo
really is a pointer to pointer to int. (In that case, the prefix of each[]
operator is already a pointer expression, and doesn't need to be converted.) But to do that, you'd have to declarefoo
as a pointer-to-pointer-to-int:and then allocate and initialize all the necessary memory. This is more flexible than
int foo[5][4]
, since you can determine the number of rows and the size (or even existence) of each row dynamically.Section 6 of the comp.lang.c FAQ explains this very well.
EDIT:
In response to Arrakis's comment, it's important to keep in mind the distinction between type and representation.
For example, these two types:
very likely have the same representation in memory (two consecutive
int
objects), but the syntax to access the elements is quite different.Similarly, the types
int[5][4]
andint[20]
have the same memory layout (20 consecutiveint
objects), but the syntax to access the elements is different.You can access
foo[2][2]
as((int*)foo)[10]
(treating the 2-dimensional array as if it were a 1-dimensional array). And sometimes it's useful to do so, but strictly speaking the behavior is undefined. You can likely get away with it because most C implementations don't do array bounds-checking. On the other hand, optimizing compilers can assume that your code's behavior is defined, and generate arbitrary code if it isn't.int foo[5][4];
foo
is not an array of pointers; it's an array of arrays. Below image will help.