Cleaning up Legacy Code “header spaghetti”

2019-06-24 01:42发布

问题:

Any recommended practices for cleaning up "header spaghetti" which is causing extremely slow compilation times (Linux/Unix)?

Is there any equvalent to "#pragma once" with GCC?
(found conflicting messages regarding this)

Thanks.

回答1:

Assuming you're familiar with "include guards" (#ifdef at the begining of the header..), an additional way of speeding up build time is by using external include guards. It was discussed in "Large Scale C++ Software Design". The idea is that classic include guards, unlike #pragma once, do not spare you the preprocessor parsing required to ignore the header from the 2nd time on (i.e. it still has to parse and look for the start and end of the include guard. With external include guards you place the #ifdef's around the #include line itself.

So it looks like this:

#ifndef MY_HEADER
#include "myheader.h"
#endif

and of course within the H file you have the classic include guard

#ifndef MY_HEADER
#define MY_HEADER

// content of header

#endif

This way the myheader.h file isn't even opened / parsed by the preprocessor, and it can save you a lot of time in large projects, especially when header files sit on shared remote locations, as they sometimes do.

again, it's all in that book. hth



回答2:

If you want to do a complete cleanup and have the time to do it then the best solution is to delete all the #includes in all the files (except for obvious ones e.g. abc.h in abc.cpp) and then compile the project. Add the necessary forward declaration or header to fix the first error and then repeat until you comple cleanly.

This doesn't fix underlying problems that can result in include issues, but it does ensure that the only includes are the required ones.



回答3:

I've read that GCC considers #pragma once deprecated, although even #pragma once can only do so much to speed things up.

To try to untangle the #include spaghetti, you can look into doxygen. It should be able to generate graphs of included headers, which may give you an edge on simplifying things. I can't recall the details offhand, but the graph features may require you to install GraphViz and tell doxygen the path where it can find GraphViz's dotty.exe.

Another approach you might consider if compile time is your primary concern is setting up Precompiled Headers.



回答4:

I read the other day about a neat trick to reduce header dependencies: Write a script that will

  • find all #include statements
  • remove one statement at a time and recompiles
  • if compilation fails, add the include statement back in

At the end, you'll hopefully end up with the minimum of required includes in your code. You could write a similar script that re-arranges includes to find out if they are self-sufficient, or require other headers to be included before them (include the header first, see if compilation fails, report it). That should go some way to cleaning up your code.

Some more notes:

  • Modern compilers (gcc among them) recognize header guards, and optimize in the same way as pragma once would, only opening the file once.
  • pragma once can be problematic when the same file has different names in the filesystem (i.e. with soft-links)

  • gcc supports #pragma once, but calls it "obsolete"
  • pragma once isn't supported by all compilers, and not part of the C standard

  • not only compilers can be problematic. Tools like Incredibuild also have issues with #pragma once


回答5:

Richard was somewhat right (Why his solution was noted down?).

Anyway, all C/C++ headers should use internal include guards.

This said, either:

1 - Your legacy code is not really maintained anymore, and you should use pre-compiled headers (which are a hack, but hey... Your need is to speed up your compilation, not refactor unmaintained code)

2 - Your legacy code is still living. Then, you either use the precompiled headers and/or the guards/external guards for a temporary solution, but in the end, you'll need to remove all your includes, one .C or .CPP at a time, and compile each .C or .CPP file one at a time, correcting their includes with forward-declarations or includes when necessary (or even breaking a large include into smaller ones to be sure each .C or .CPP file will get only the headers it needs). Anyway, testing and removing obsolete includes is part of maintenance of a project, so...

My own experience with precompiled headers was not exactly a good one, because half the time, the compiler could not find a symbol I had defined, and so I tried a full "clean/rebuild", to be sure it was not the precompiled header that was obsolete. So my guess is to use it for external libraries you won't even touch (like the STL, C API headers, Boost, whatever). Still, my own experience was with Visual C++ 6, so I guess (hope?) they got it right, now.

Now, one last thing: Headers should always be self-sufficient. That means that if the inclusion of headers depends on order of inclusion, then you have a problem. For example, if you can write:

#include "AAA.hpp"
#include "BBB.hpp"

But not:

#include "BBB.hpp"
#include "AAA.hpp"

because BBB depends on AAA, then all you have is a dependency you never acknowledged in the code. Not acknowledging it with a define will only make your compilation a nightmare. BBB should include AAA, too (even if it could be somewhat slower: in the end, forward-declarations will anyway clean useless includes, so you should have a faster compile timer).



回答6:

Use one or more of those for speeding up the build time

  1. Use Precompiled Headers
  2. Use a caching mechanism (scons for example)
  3. Use a distributed build system ( distcc, Incredibuild($) )


回答7:

In headers: include headers only if you can't use forward declaration, but always #include any file that you need (include dependencies are evil!).



回答8:

As mentioned in the other answer, you should definitely use forward declarations whenever possible. To my knowledge, GCC doesn't have anything equivalent to #pragma once, which is why I stick to the old fashion style of include guards.



回答9:

Thanks for the replies, but the question is regarding existing code which includes strict "include order" etc. The question is whether there are any tools/scripts to clarify what is actually going on.

Header guards arent the solution as they dont prevent the compiler from reading the whole file again and again and ...



回答10:

PC-Lint will go a long way to cleaning up spaghetti headers. Also it will solve other problems for you too like uninitialised variables going unseen, etc.



回答11:

As onebyone.livejournal.com commented in a response to your question, some compilers support include guard optimization, which the page I linked defines as follows:

The include guard optimisation is when a compiler recognises the internal include guard idiom described above and takes steps to avoid opening the file multiple times. The compiler can look at an include file, strip out comments and white space and work out if the whole of the file is within the include guards. If it is, it stores the filename and include guard condition in a map. The next time the compiler is asked to include the file, it can check the include guard condition and make the decision whether to skip the file or #include it without needing to open the file.

Then again, you already answered that external include guards are not the answer to your question. For disentangling header files that must be included in a specific order, I would suggest the following:

  • Each .c or .cpp file should #include the corresponding .h file first, and the rest of its #include directives should be sorted alphabetically. You will usually get build errors when this breaks unstated dependencies between header files.
  • If you have a header file that defines global typedefs for basic types or global #define directives that are used for most of the code, each .h file should #include that file first, and the rest of its #include directives should be sorted alphabetically.
  • When these changes cause compile errors, you will usually have to add an explicit dependency from one header file to another, in the form of an #include.
  • When these changes do not cause compile errors, they might cause behavioral changes. Hopefully you have some sort of test suite that you can use to verify the functionality of your application.

It also sounds like part of the problem might be that incremental builds are much slower than they ought to be. This situation can be improved with forward declarations or a distributed build system, as others have pointed out.