How exactly does the standard define that, for example, float (*(*(&e)[10])())[5]
declares a variable of type "reference to array of 10 pointer to function of () returning pointer to array of 5 float
"?
Inspired by discussion with @DanNissenbaum
How exactly does the standard define that, for example, float (*(*(&e)[10])())[5]
declares a variable of type "reference to array of 10 pointer to function of () returning pointer to array of 5 float
"?
Inspired by discussion with @DanNissenbaum
I refer to the C++11 standard in this post
Declarations
Declarations of the type we're concerned with are known as simple-declarations in the grammar of C++, which are of one of the following two forms (§7/1):
The attribute-specifier-seq is a sequence of attributes (
[[something]]
) and/or alignment specifiers (alignas(something)
). Since these don't affect the type of the declaration, we can ignore them and the second of the above two forms.Declaration specifiers
So the first part of our declaration, the decl-specifier-seq, is made up of declaration specifiers. These include some things that we can ignore, such as storage specifiers (
static
,extern
, etc.), function specifiers (inline
, etc.), thefriend
specifier, and so on. However, the one declaration specifier of interest to us is the type specifier, which may include simple type keywords (char
,int
,unsigned
, etc.), names of user-defined types, cv-qualifiers (const
orvolatile
), and others that we don't care about.Example: So a simple example of a decl-specifier-seq which is just a sequence of type specifiers is
const int
. Another one could beunsigned int volatile
.You may think "Oh, so something like
const volatile int int float const
is also a decl-specifier-seq?" You'd be right that it fits the rules of the grammar, but the semantic rules disallow such a decl-specifier-seq. Only one type specifier is allowed, in fact, except for certain combinations (such asunsigned
withint
orconst
with anything except itself) and at least one non-cv-qualifier is required (§7.1.6/2-3).Quick Quiz (you might need to reference the standard)
Is
const int const
a valid declaration specifier sequence or not? If not, is it disallowed by the syntactic or semantic rules?Is
unsigned const int
a valid declaration specifier sequence or not? If not, is it disallowed by the syntactic or semantic rules?Is
auto const
a valid declaration specifier sequence or not? If not, is it disallowed by the syntactic or semantic rules?Is
int * const
a valid declaration specifier sequence or not? If not, is it disallowed by the syntactic or semantic rules?Declarators
The second part of a simple-declaration is the init-declarator-list. It is a sequence of declarators separated by commas, each with an optional initializer (§8). Each declarator introduces a single variable or function into the program. The most simple form of declarator is just the name you're introducing - the declarator-id. The declaration
int x, y = 5;
has a declaration specifier sequence that is justint
, followed by two declarators,x
andy
, the second of which has an initializer. We will, however, ignore initializers for the rest of this post.A declarator can have a particularly complex syntax because this is the part of the declaration that allows you to specify whether the variable is a pointer, reference, array, function pointer, etc. Note that these are all part of the declarator and not the declaration as a whole. This is precisely the reason why
int* x, y;
does not declare two pointers - the asterisk*
is part of the declarator ofx
and not part of the declarator ofy
. One important rule is that every declarator must have exactly one declarator-id - the name it is declaring. The rest of the rules about valid declarators are enforced once the type of the declaration is determined (we'll come to it later).Example: A simple example of a declarator is
*const p
, which declares aconst
pointer to... something. The type it points to is given by the declaration specifiers in its declaration. A more terrifying example is the one given in the question,(*(*(&e)[10])())[5]
, which declares a reference to an array of function pointers that return pointers to... again, the final part of the type is actually given by the declaration specifiers.You're unlikely to ever come across such horrible declarators but sometimes similar ones do appear. It's a useful skill to be able to read a declaration like the one in the question and is a skill that comes with practice. It is helpful to understand how the standard interprets the type of a declaration.
Quick Quiz (you might need to reference the standard)
Which parts of
int const unsigned* const array[50];
are the declaration specifiers and the declarator?Which parts of
volatile char (*fp)(float const), &r = c;
are the declaration specifiers and the declarators?Declaration Types
Now we know that a declaration is made up of a declarator specifier sequence and a list of declarators, we can begin to think about how the type of a declaration is determined. For example, it might be obvious that
int* p;
definesp
to be a "pointer to int", but for other types it's not so obvious.A declaration with multiple declarators, let's say 2 declarators, is considered to be two declarations of particular identifiers. That is,
int x, *y;
is a declaration of identifierx
,int x
, and a declaration of identifiery
,int *y
.Types are expressed in the standard as English-like sentences (such as "pointer to int"). The interpretation of a declaration's type in this English-like form is done in two parts. First, the type of the declaration specifier is determined. Second, a recursive procedure is applied to the declaration as a whole.
Declaration specifiers type
The type of a declaration specifier sequence is determined by Table 10 of the standard. It lists the types of the sequences given that they contain the corresponding specifiers in any order. So for example, any sequence that contains
signed
andchar
in any order, includingchar signed
, has type "signed char". Any cv-qualifier that appears in the declaration specifier sequence is added to the front of the type. Sochar const signed
has type "const signed char". This makes sure that regardless of what order you put the specifiers, the type will be the same.Quick Quiz (you might need to reference the standard)
What is the type of the declaration specifier sequence
int long const unsigned
?What is the type of the declaration specifier sequence
char volatile
?What is the type of the declaration specifier sequence
auto const
?Declaration type
Now that we have the type of the declaration specifier sequence, we can work out the type of an entire declaration of an identifier. This is done by applying a recursive procedure defined in §8.3. To explain this procedure, I'll use a running example. We'll work out the type of
e
infloat const (*(*(&e)[10])())[5]
.Step 1 The first step is to split the declaration into the form
T D
whereT
is the declaration specifier sequence andD
is the declarator. So we get:The type of
T
is, of course, "const float", as we determined in the previous section. We then look for the subsection of §8.3 that matches the current form ofD
. You'll find that this is §8.3.4 Arrays, because it states that it applies to declarations of the formT D
whereD
has the form:Our
D
is indeed of that form whereD1
is(*(*(&e)[10])())
.Now imagine a declaration
T D1
(we've gotten rid of the[5]
).It's type is "<some stuff>
T
". This section states that the type of our identifier,e
, is "<some stuff> array of 5T
", where <some stuff> is the same as in the type of the imaginary declaration. So to work out the remainder of the type, we need to work out the type ofT D1
.This is the recursion! We recursively work out the type of an inner part of the declaration, stripping a bit of it off at every step.
Step 2 So, as before, we split our new declaration into the form
T D
:This matches paragraph §8.3/6 where
D
is of the form( D1 )
. This case is simple, the type ofT D
is simply the type ofT D1
:Step 3 Let's call this
T D
now and split it up again:This matches §8.3.1 Pointers where
D
is of the form* D1
. IfT D1
has type "<some stuff>T
", thenT D
has type "<some stuff> pointer toT
". So now we need the type ofT D1
:Step 4 We call it
T D
and split it up:This matches §8.3.5 Functions where
D
is of the formD1 ()
. IfT D1
has type "<some stuff>T
", thenT D
has type "<some stuff> function of () returningT
". So now we need the type ofT D1
:Step 5 We can apply the same rule we did for step 2, where the declarator is simply parenthesised to end up with:
Step 6 Of course, we split it up:
We match §8.3.1 Pointers again with
D
of the form* D1
. IfT D1
has type "<some stuff>T
", thenT D
has type "<some stuff> pointer toT
". So now we need the type ofT D1
:Step 7 Split it up:
We match §8.3.4 Arrays again, with
D
of the formD1 [10]
. IfT D1
has type "<some stuff>T
", thenT D
has type "<some stuff> array of 10T
". So what isT D1
's type?Step 8 Apply the parentheses step again:
Step 9 Split it up:
Now we match §8.3.2 References where
D
is of the form& D1
. IfT D1
has type "<some stuff>T
", thenT D
has type "<some stuff> reference toT
". So what is the type ofT D1
?Step 10 Well it's just "T" of course! There is no <some stuff> at this level. This is given by the base case rule in §8.3/5.
And we're done!
So now if we look at the type we determined at each step, substituting the <some stuff>s from each level below, we can determine the type of
e
infloat const (*(*(&e)[10])())[5]
:If we combine this all together, what we get is:
Nice! So that shows how the compiler deduces the type of a declaration. Remember that this is applied to each declaration of an identifier if there are multiple declarators. Try figuring out these:
Quick Quiz (you might need to reference the standard)
What is the type of
x
in the declarationbool **(*x)[123];
?What are the types of
y
andz
in the declarationint const signed *(*y)(int), &z = i;
?If anybody has any corrections, please let me know!
Here's the way I parse
float const (*(*(&e)[10])())[5]
. First of all, identify the specifier. Here the specifier isfloat const
. Now, let's look at the precedence.[] = () > *
. The parentheses are used to disambiguate the precedence. With precedence in mind, let's identify the variable ID, which ise
. So, e is a reference to an array (since[] > *
) of 10 pointers to functions (since() > *
) which take no argument and return and a pointer to an array of 5 float const. So the specifier comes last and rest are parsed according to the precedence.