I am building a lot of auto-generated code, including one particularly large file (~15K lines), using a mingw32 cross compiler on linux. Most files are extremely quick, but this one large file takes an unexpectedly long time (~15 minutes) to compile.
I have tried manipulating various optimization flags to see if they had any effect, without any luck. What I really need is some way of determining what g++ is doing that is taking so long. Are there any (relatively simple) ways to have g++ generate output about different phases of compilation, to help me narrow down what the hang-up might be?
Sadly, I do not have the ability to rebuild this cross-compiler, so adding debugging information to the compiler and stepping through it is not a possibility.
What's in the file:
- a bunch of includes
- a bunch of string comparisons
- a bunch of if-then checks and constructor invocations
The file is a factory for producing a ton of different specific subclasses of a certain parent class. Most of the includes, however, are nothing terribly fancy.
The results of -ftime-report, as suggested by Neil Butterworth, indicate that the "life analysis" phase is taking 921 seconds, which takes up most of the 15 minutes.
It appears that this takes place during data flow analysis. The file itself is a bunch of conditional string comparisons, constructing an object by class name provided as a string.
We think changing this to point into a map of names to function pointers might improve things a bit, so we're going to try that.
Indeed, generating a bunch of factory functions (per object) and creating a map from the string name of the object to a pointer to its factory function reduced compile time from the original 15 minutes to about 25 seconds, which will save everyone tons of time on their builds.
Thanks again to Neil Butterworth for the tip about -ftime-report.
What the compiler sees is the output of the pre-processor, so the size of the individual source is not a good measure, you have to consider the source and all the files it includes, and the files they include etc. Instantiation of templates for multiple types generates code for each separate type used, so that could end up being a lot of code. If you have made extensive used of STL containers for many classes for example.
15K lines in one source is rather a lot, but even if split up, all that code still needs to be compiled; however using an incremental build may mean that it does not all need compiling all the time. There really is no need for a file that large; its just poor practice/design. I start thinking about better modularisation when a file gets to 500 lines (although I am not dogmatic about it)
I'd use
#if 0
/#endif
to eliminate large portions of the source file from compilation. Repeat with different blocks of code until you pinpoint which block(s) are slow. For starters, you can see if your#include
's are the problem by using#if 0
/#endif
to exclude everything but the#include
's.It most probably includes TONNES of includes. I believe -MD will list out all the include files in a given CPP file (That includes includes of includes and so forth).
Related to @Goz and @Josh_Kelley, you can get gcc/g++ to spit out the preprocessed source (with #includes inline) using -E. That's one way to determine just how large your source is.
And if the compiler itself is the problem, you may be able to strace the compile command that's taking a long time to see whether there's a particular file access or a specific internal action that's taking a long time.
Won't give all the details you want, but try running with the
-v
(verbose) and-ftime-report
flags. The latter produces a summary of what the compiler has been up to.Another process to try is to add "progress marker"
pragma
s to your code to trap the portion of the code that is taking a long time. The Visual Studio compiler provides#pragma message()
, although there is not a standard pragma for doing this.Put one marker at the beginning of the code and a marker at the end of the code. The end marker could be a
#error
since you don't care about the remainder of the source file. Move the markers accordingly to trap the section of code taking the longest time.Just a thought...