I recently encountered a case where I need to compare two files (golden and expected) for verification of test results and even though the data written to both the files were same, the files does not match.
On further investigation, I found that there is a structure which contains some integers and a char array of 64 bytes, and not all the bytes of char array were getting used in most of the cases and unused fields from the array contain random data and that was causing the mismatch.
This brought me ask the question whether it is good practice to initialize the array in C/C++ as well, as it is done in Java?
I strongly disagree with the given opinions that doing so is "eliminating a common source of bugs" or "not doing so will mess with your program's correctness". If the program works with unitialized values then it has a bug and is incorrect. Initializing the values does not eliminate this bug, because they often still do not have the expected values at the first use. However, when they contain random garbage, the program is more likely to crash in a random way at every try. Always having the same values may give a more deterministic behaviour in crashing and makes debugging easier.
For your specific question, it is also good security practice to overwrite unused parts before they are written to a file, because they may contain something from a previous use that you do not want to be written, like passwords.
I would say that the good practice in C++ is using a std::vector<> instead of an array. This is not valid for C, of course.
Keep in mind that keeping arrays uninitialized may have advantages like performance.
It's only bad reading from uninitialized arrays. Having them around without ever reading from uninitialized places is fine.
Moreover if your program has bug that makes it read from uninitialized place in array, then "covering it up" by defensively initializing all array to known value is not the solution for bug, and can only make it surface later.
One could write a big article on the difference between the two styles one can encounter, people who initialize variables always when declaring them and people who initialize them when necessary. I share a big project with someone who is in the first category and I am now definitly more of the second type. Always initializing variables has brought more subtle bugs and problems than not and I will try to explain why, remembering the cases I found. First example:
This was the code written by the other guy. This function is the hottest function in our application (you imagine a text index on 500 000 000 sentences in a ternary tree, the FIFO stack is used to handle the recursion as we do not want to use recursive function calls). This was typical of his programming style because of his systematic initialization of variables. The problem with that code was the hidden
memcpy
of the initialization and the two other copies of the structures (which btw were not calls tomemcpy
gcc's strange sometimes), so we had 3 copies + a hidden function call in the hottest function of the project. Rewriting it toOnly one copy (and supplemental benefit on SPARC where it runs, the function is a leaf function thanks to the avoided call to
memcpy
and does not need to build a new register window). So the function was 4 times faster.Another problem I found ounce but do not remember where exactly (so no code example, sorry). A variable that was initialized when declared but it was used in a loop, with
switch
in a finite state automaton. The problem the initialization value was not one of the states of the automaton and in some extremly rare cases the automaton didn't work correctly. By removing the initializer, the warning the compiler emitted made it obvious that the variable could be used before it was properly initialized. Fixing the automaton was easy then. Morality: defensively initialising a variable may suppress a very usefull warning of the compiler.Conclusion: Initialise your variables wisely. Doing it systematicaly is nothing more than following a cargo-cult (my buddy at work is the worse cargo-culter one can imagine, he never uses goto, always initialize a variable, use a lot of static declarations (it's faster ye know (it's in fact even really slow on SPARC 64bit), makes all functions
inline
even if they have 500 lines (using__attribute__((always_inline))
when the compiler does not want)If you don't initialize the values in a c++ array, then the values could be anything, so it would be good practice to zero them out if you want predictable results.
But if you use the char array like a null terminated string, then you should be able to write it to a file with the proper function.
Although in c++ it might be better to use a more OOP solution. I.E. vectors, strings, etc.
First, you should initialize arrays, variables, etc. if not doing so will mess with your program's correctness.
Second, it appears that in this particular case, not initializing the array did not affect the correctness of the original program. Instead, the program meant to compare the files does not know enough about the file format used to tell if the files differ in a meaningful way ("meaningful" defined by the first program).
Instead of complaining about the original program, I would fix the comparison program to know more about the file format in question. If the file format isn't well documented then you've got a good reason to complain.