I long thought that in C, all variables had to be declared at the beginning of the function. I know that in C99, the rules are the same as in C++, but what are the variable declaration placement rules for C89/ANSI C?
The following code compiles successfully with gcc -std=c89
and gcc -ansi
:
#include <stdio.h>
int main() {
int i;
for (i = 0; i < 10; i++) {
char c = (i % 95) + 32;
printf(\"%i: %c\\n\", i, c);
char *s;
s = \"some string\";
puts(s);
}
return 0;
}
Shouldn\'t the declarations of c
and s
cause an error in C89/ANSI mode?
It compiles successfully because GCC allows it as a GNU extension, even though it\'s not part of the C89 or ANSI standard. If you want to adhere strictly to those standards, you must pass the -pedantic
flag.
For C89, you must declare all of your variables at the beginning of a scope block.
So, your char c
declaration is valid as it is at the top of the for loop scope block. But, the char *s
declaration should be an error.
Grouping variable declarations at the top of the block is a legacy likely due to limitations of old, primitive C compilers. All modern languages recommend and sometimes even enforce the declaration of local variables at the latest point: where they\'re first initialized. Because this gets rid of the risk of using a random value by mistake. Separating declaration and initialization also prevents you from using \"const\" (or \"final\") when you could.
C++ unfortunately keeps accepting the old, top declaration way for backward compatibility with C (one C compatibility drag out of many others...) But C++ tries to move away from it:
- The design of C++ references does not even allow such top of the block grouping.
- If you separate declaration and initialization of a C++ local object then you pay the cost of an extra constructor for nothing. If the no-arg constructor does not exist then again you are not even allowed to separate both!
C99 starts to move C in this same direction.
If you are worried of not finding where local variables are declared then it means you have a much bigger problem: the enclosing block is too long and should be split.
https://www.securecoding.cert.org/confluence/display/cplusplus/DCL19-CPP.+Initialize+automatic+local+variables+on+declaration
From a maintainability, rather than syntactic, standpoint, there are at least three trains of thought:
Declare all variables at the beginning of the function so they\'ll be in one place and you\'ll be able to see the comprehensive list at a glance.
Declare all variables as close as possible to the place they\'re first used, so you\'ll know why each is needed.
Declare all variables at the beginning of the innermost scope block, so they\'ll go out of scope as soon as possible and allow the compiler to optimize memory and tell you if you accidentally use them where you hadn\'t intended.
I generally prefer the first option, as I find the others often force me to hunt through code for the declarations. Defining all variables up front also makes it easier to initialize and watch them from a debugger.
I\'ll sometimes declare variables within a smaller scope block, but only for a Good Reason, of which I have very few. One example might be after a fork()
, to declare variables needed only by the child process. To me, this visual indicator is a helpful reminder of their purpose.
As noted by others, GCC is permissive in this regard (and possibly other compilers, depending on the arguments they\'re called with) even when in \'C89\' mode, unless you use \'pedantic\' checking. To be honest, there are not many good reasons to not have pedantic on; quality modern code should always compile without warnings (or very few where you know you are doing something specific that is suspicious to the compiler as a possible mistake), so if you cannot make your code compile with a pedantic setup it probably needs some attention.
C89 requires that variables be declared before any other statements within each scope, later standards permit declaration closer to use (which can be both more intuitive and more efficient), especially the simultaneous declaration and initialization of a loop control variable in \'for\' loops.
As has been noted, there are two schools of thought on this.
1) Declare everything at the top of functions because the year is 1987.
2) Declare closest to first use and in the smallest scope possible.
My answer to this is DO BOTH! Let me explain:
For long functions, 1) makes refactoring very hard. If you work in a codebase where the developers are against the idea of subroutines, then you\'ll have 50 variable declarations at the start of the function and some of them might just be an \"i\" for a for-loop that\'s at the very bottom of the function.
I therefore developed declaration-at-the-top-PTSD from this and tried to do option 2) religiously.
I came back around to option one because of one thing: short functions. If your functions are short enough, then you will have few local variables and since the function is short, if you put them at the top of the function, they will still be close to the first use.
Also, the anti-pattern of \"declare and set to NULL\" when you want to declare at the top but you haven\'t made some calculations necessary for initialization is resolved because the things you need to initialize will likely be received as arguments.
So now my thinking is that you should declare at the top of functions and as close as possible to first use. So BOTH! And the way to do that is with well divided subroutines.
But if you\'re working on a long function, then put things closest to first use because that way it will be easier to extract methods.
My recipe is this. For all local variables, take the variable and move it\'s declaration to the bottom, compile, then move the declaration to just before the compilation error. That\'s the first use. Do this for all local variables.
int foo = 0;
<code that uses foo>
int bar = 1;
<code that uses bar>
<code that uses foo>
Now, define a scope block that starts before the declaration and move the end until the program compiles
{
int foo = 0;
<code that uses foo>
}
int bar = 1;
<code that uses bar>
>>> First compilation error here
<code that uses foo>
This doesn\'t compile because there is some more code that uses foo. We can notice that the compiler was able to go through the code that uses bar because it doesn\'t use foo. At this point, there are two choices. The mechanical one is to just move the \"}\" downwards until it compiles, and the other choice is to inspect the code and determine if the order can be changed to:
{
int foo = 0;
<code that uses foo>
}
<code that uses foo>
int bar = 1;
<code that uses bar>
If the order can be switched, that\'s probably what you want because it shortens the lifespan of temporary values.
Another thing to note, does the value of foo need to be preserved between the blocks of code that use it, or could it just be a different foo in both. For example
int i;
for(i = 0; i < 8; ++i){
...
}
<some stuff>
for(i = 3; i < 32; ++i){
...
}
These situations need more than my procedure. The developer will have to analyse the code to determine what to do.
But the first step is finding the first use. You can do it visually but sometimes, it\'s just easier to delete the declaration, try to compile and just put it back above the first use. If that first use is inside an if statement, put it there and check if it compiles. The compiler will then identify other uses. Try to make a scope block that encompasses both uses.
After this mechanical part is done, then it becomes easier to analyse where the data is. If a variable is used in a big scope block, analyse the situation and see if you\'re just using the same variable for two different things (like an \"i\" that gets used for two for loops). If the uses are unrelated, create new variables for each of these unrelated uses.
I will quote some statements from the manual for gcc version 4.7.0 for a clear explanation.
\"The compiler can accept several base standards, such as ‘c90’ or ‘c++98’, and GNU dialects of those standards, such as ‘gnu90’ or ‘gnu++98’. By specifying a base standard, the compiler will accept all programs following that standard and those using GNU extensions that do not contradict it. For example, ‘-std=c90’ turns off certain features of GCC that are incompatible with ISO C90, such as the asm and typeof keywords, but not other GNU extensions that do not have a meaning in ISO C90, such as omitting the middle term of a ?: expression.\"
I think the key point of your question is that why does not gcc conform to C89 even if the option \"-std=c89\" is used. I don\'t know the version of your gcc, but I think that there won\'t be big difference. The developer of gcc has told us that the option \"-std=c89\" just means the extensions which contradict C89 are turned off. So, it has nothing to do with some extensions that do not have a meaning in C89. And the extension that don\'t restrict the placement of variable declaration belongs to the extensions that do not contradict C89.
To be honest, everyone will think that it should conform C89 totally at the first sight of the option \"-std=c89\". But it doesn\'t.
As for the problem that declare all variables at the beginning is better or worse is just A matter of habit.