There are several flavors of arrays, depending on how and where they are declared.
Fixed-length Arrays
Fixed-length arrays must have their size determined at compile time. You cannot change the size of a fixed-length array after it has been defined.
Fixed-length arrays are declared in one of the following ways:
T a[N];
T a[N] = { /* initializer list */ };
char_type a[N] = "string literal";
T a[] = { /* initializer list */ };
char_type a[] = "string literal";
In the first three cases, N
must be a constant expression whose value must be known at compile time. In the first three cases, the size of the array is taken from N
; in the last two cases, it's taken from the number of elements in the initializer list or the size of the string literal.
The initial contents of a fixed-length array depend on its storage duration and whether an initializer has been supplied.
If the array has static
storage duration (meaning it was declared at file scope outside of any function body, or was declared with the static
keyword) and no initializer is present, then all of the array elements are initialized to 0
(for scalars) or NULL
(for pointers). If T
is an aggregate type such as a struct
or an array type, then each member of the aggregate is initialized with a 0
or NULL
. union
types are similarly zeroed out.
If the array has auto
storage duration (meaning it was declared within a function or block without the static
keyword) and no initializer is present, then the contents of the array are indeterminate - basically, garbage.
If the array is declared with an initializer list (regardless of storage duration), then the initial values of the array elements correspond to the initializer. If there are fewer elements in the initializer than the array (for example, N
is 10 but you only initialize the first 5 elements), then the remaining elements are initialized as though the array had static
storage duration. IOW, given the declaration
int a[10] = {0, 1, 2};
then the initial contents of the array are {0, 1, 2, 0, 0, 0, 0, 0, 0, 0}
.
Fixed-length arrays containing string values may be initialized using a string literal. C allows for "wide" character strings, so char_type
may be either char
or wchar_t
. The rules are the same for regular initializer lists, except that N
(if specified) must be at least 1 more than the length of the string to account for the string terminator.
This means that
char a[10] = "test";
will be initialized as {'t', 'e', 's', 't', 0, 0, 0, 0, 0, 0}
and
char a[] = "test";
will be initialized as {'t', 'e', 's', 't', 0}
.
Arrays with static
storage duration are stored such that they are available as soon as the program is loaded, and aren't released until the program exits. This usually means that they're stored in a memory segment like .data
or .bss
(or the equivalent for whatever executable format your system uses).
Arrays with auto
storage duration are stored such that they are allocated at block or function entry and released at block or function exit (in practice, they'll probably be allocated at function entry, regardless of whether they're limited to a smaller scope within the function) - this typically translates to the stack, although it doesn't have to.
Variable-length Arrays
Variable-length arrays were added in C99 - they behave mostly like fixed-length arrays, except that their size is established at run time; N
does not have to be a compile-time constant expression:
int n;
printf( "gimme the array size: ");
scanf( "%d", &n );
T a[n]; // for any type T
Contrary to what their name implies, you cannot change the size of a variable-length array after it has been defined. "Variable-length" simply means that the size isn't fixed at compile time, and can change from definition to definition.
Since their size isn't set until runtime, variable-length arrays may not be declared at file scope or with the static
keyword, nor can they be declared with an initializer list. Exactly how the space for VLAs is managed is up to the implementation; it may be (and usually is) taken from the stack, but AFAIK may be taken from somewhere else.
Dynamic Arrays
Dynamic arrays are not really "arrays" as such, at least in terms of the data types of the objects we use to manage them. Dynamic arrays are allocated at runtime using one of malloc
, calloc
, or realloc
, and that storage is held until released with a call to free
.
T *p = malloc( sizeof *p * N ); // where N may be either a compile-time or
// run-time expression
...
free( p );
A dynamic array may be resized using the realloc
library function, like so:
/**
* If realloc fails, it will return NULL and leave the original array in
* place. We assign the result to a temporary variable so we don't risk
* losing our only reference to that memory.
*/
T *tmp = realloc( p, sizeof *p * new_size );
if ( tmp )
p = tmp;
While the memory for the array elements themselves is taken from the heap (or whatever dynamic memory pool), the memory for the pointer variable p
will be allocated from either a .bss
or .data
segment or from the stack, based on p
's storage duration (static
or auto
).
Memory allocated with malloc
or realloc
is not initialized; the contents of that memory will be indeterminate. Memory allocated with calloc
will be initialized with zeros.
Arrays vs. Pointers
At some point, somebody is going to tell you that "an array is just a pointer". That person is not correct.
When you declare an array (either fixed- or variable-length), enough storage is set aside for the elements of that array and nothing else; no storage is set aside for any metadata such as the array length or a pointer to the first element. Given the declaration
T a[N];
then the storage will look something like this:
+---+
a: | | a[0]
+---+
| | a[1]
+---+
| | a[2]
+---+
...
+---+
| | a[N-1]
+---+
There is no object a
apart from the array elements themselves (or, more properly, the object a
is the elements of the array), and the expression a
may not be the target of an assignment.
But...
The expression a[i]
is defined as *(a + i)
; that is, given a pointer value a
, offset i
elements (not bytes!) from that address and dereference the result. But if a
is not a pointer, how can that work?
Like this - except when it is the operand of the sizeof
or unary &
operators, or is a string literal used as an array initializer in a declaration, an expression of type "N
-element array of T
" will be converted ("decay") to an expression of type "pointer to T
", and the value of the expression will be the address of the first element of the array.
This has several implications:
- The expressions
a
, &a
, and &a[0]
will all yield the same value (the address of the first element of the array), but the types of the expressions will be different (T *
, T (*)[N]
, and T *
, respectively);
- The subscript operator
[]
works equally well with both array expressions and pointer expressions (indeed, it's defined to work on pointer expressions);
- When you pass an array expression to a function, what you are actually passing is a pointer value, not the entire array;
For dynamic arrays, the situation is different. Given the line
T *p = malloc( sizeof *p * N );
then your storage will look something like this:
+---+
p: | | ---+
+---+ |
... |
+------+
|
V
+---+
| | p[0]
+---+
| | p[1]
+---+
...
+---+
| | p[N-1]
+---+
In this case, p
is a separate object from the array. Thus, &p
won't give you the same value as p
and &p[0]
, and its type will be T **
as opposed to T (*)[N]
. Also, since p
is just a pointer variable, you can assign a new value to it (although if you do so without free
ing the memory it points to first, you'll create a memory leak).
Similarly, sizeof p
won't behave like sizeof a
; it will simply return the size of the pointer variable, not the size of the allocated memory that the pointer points to.