可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I recently encountered a case where I need to compare two files (golden and expected) for verification of test results and even though the data written to both the files were same, the files does not match.
On further investigation, I found that there is a structure which contains some integers and a char array of 64 bytes, and not all the bytes of char array were getting used in most of the cases and unused fields from the array contain random data and that was causing the mismatch.
This brought me ask the question whether it is good practice to initialize the array in C/C++ as well, as it is done in Java?
回答1:
It is good practice to initialise memory/variables before you use them - uninitialised variables are a big source of bugs that are often very hard to track down.
Initialising all the data is a very good idea when writing it to a file format: It keeps the file contents cleaner so they are easier to work with, less prone to problems if someone incorrectly tries to "use" the uninitialised data (remember it may not just be your own code that reads the data in future), and makes the files much more compressible.
The only good reason not to initialise variables before you use them is in performance-critical situations, where the initialisation is technically "unnecessary" and incurs a significant overhead. But in most cases initialising variables won't cause significant harm (especially if they are only declared immediately before they are used), but will save you a lot of development time by eliminating a common source of bugs.
回答2:
Using an undefined value in an array results in undefined behaviour.
Thus the program is free to produce differing results. This may mean your files end up slightly different, or that the program crashes, or the program formats your hard drive, or the program causes demons to fly out the users nose ( http://catb.org/jargon/html/N/nasal-demons.html )
This doesn't mean you need to define your array values when you create the array, but you must ensure you initialise any array value before you use it. Of course the simplest way to ensure this is to do this when you create the array.
MyStruct array[10];
printf( "%f", array[2].v ); // POTENTIAL BANG!
array[3].v = 7.0;
...
printf( "%f", array[3].v ); // THIS IS OK.
Don't forget that for huge arrays of PODs there's a nice shorthand to initialise all members to zero
MyPODStruct bigArray[1000] = { 0 };
回答3:
I strongly disagree with the given opinions that doing so is "eliminating a common source of bugs" or "not doing so will mess with your program's correctness". If the program works with unitialized values then it has a bug and is incorrect. Initializing the values does not eliminate this bug, because they often still do not have the expected values at the first use. However, when they contain random garbage, the program is more likely to crash in a random way at every try. Always having the same values may give a more deterministic behaviour in crashing and makes debugging easier.
For your specific question, it is also good security practice to overwrite unused parts before they are written to a file, because they may contain something from a previous use that you do not want to be written, like passwords.
回答4:
If you don't initialize the values in a c++ array, then the values could be anything, so it would be good practice to zero them out if you want predictable results.
But if you use the char array like a null terminated string, then you should be able to write it to a file with the proper function.
Although in c++ it might be better to use a more OOP solution. I.E. vectors, strings, etc.
回答5:
Keep in mind that keeping arrays uninitialized may have advantages like performance.
It's only bad reading from uninitialized arrays. Having them around without ever reading from uninitialized places is fine.
Moreover if your program has bug that makes it read from uninitialized place in array, then "covering it up" by defensively initializing all array to known value is not the solution for bug, and can only make it surface later.
回答6:
One could write a big article on the difference between the two styles one can encounter, people who initialize variables always when declaring them and people who initialize them when necessary. I share a big project with someone who is in the first category and I am now definitly more of the second type.
Always initializing variables has brought more subtle bugs and problems than not and I will try to explain why, remembering the cases I found.
First example:
struct NODE Pop(STACK * Stack)
{
struct NODE node = EMPTY_STACK;
if(Stack && Stack->stackPointer)
node = Stack->node[--Stack->stackPointer];
return node;
}
This was the code written by the other guy. This function is the hottest function in our application (you imagine a text index on 500 000 000 sentences in a ternary tree, the FIFO stack is used to handle the recursion as we do not want to use recursive function calls).
This was typical of his programming style because of his systematic initialization of variables. The problem with that code was the hidden memcpy
of the initialization and the two other copies of the structures (which btw were not calls to memcpy
gcc's strange sometimes), so we had 3 copies + a hidden function call in the hottest function of the project.
Rewriting it to
struct NODE Pop(STACK * Stack)
{
if(Stack && Stack->stackPointer)
return Stack->node[--Stack->stackPointer];
return EMPTY_STACK;
}
Only one copy (and supplemental benefit on SPARC where it runs, the function is a leaf function thanks to the avoided call to memcpy
and does not need to build a new register window). So the function was 4 times faster.
Another problem I found ounce but do not remember where exactly (so no code example, sorry). A variable that was initialized when declared but it was used in a loop, with switch
in a finite state automaton. The problem the initialization value was not one of the states of the automaton and in some extremly rare cases the automaton didn't work correctly. By removing the initializer, the warning the compiler emitted made it obvious that the variable could be used before it was properly initialized. Fixing the automaton was easy then.
Morality: defensively initialising a variable may suppress a very usefull warning of the compiler.
Conclusion: Initialise your variables wisely. Doing it systematicaly is nothing more than following a cargo-cult (my buddy at work is the worse cargo-culter one can imagine, he never uses goto, always initialize a variable, use a lot of static declarations (it's faster ye know (it's in fact even really slow on SPARC 64bit), makes all functions inline
even if they have 500 lines (using __attribute__((always_inline))
when the compiler does not want)
回答7:
First, you should initialize arrays, variables, etc. if not doing so will mess with your program's correctness.
Second, it appears that in this particular case, not initializing the array did not affect the correctness of the original program. Instead, the program meant to compare the files does not know enough about the file format used to tell if the files differ in a meaningful way ("meaningful" defined by the first program).
Instead of complaining about the original program, I would fix the comparison program to know more about the file format in question. If the file format isn't well documented then you've got a good reason to complain.
回答8:
I would say that the good practice in C++ is using a std::vector<> instead of an array. This is not valid for C, of course.