Why does C++ compilation take so long?

2019-01-01 07:26发布

Compiling a C++ file takes a very long time when compared to C# and Java. It takes significantly longer to compile a C++ file than it would to run a normal size Python script. I'm currently using VC++ but it's the same with any compiler. Why is this?

The two reasons I could think of were loading header files and running the preprocessor, but that doesn't seem like it should explain why it takes so long.

14条回答
看风景的人
2楼-- · 2019-01-01 08:03

As already commented, the compiler spends a lot of time instantiating and instantiating over again the templates. To such an extend that there are projects that focus on that particular item, and claim an observable 30x speed-up in some really favorable cases. See http://www.zapcc.com.

查看更多
只若初见
3楼-- · 2019-01-01 08:07

Parsing and code generation are actually rather fast. The real problem is opening and closing files. Remember, even with include guards, the compiler still have open the .H file, and read each line (and then ignore it).

A friend once (while bored at work), took his company's application and put everything -- all source and header files-- into one big file. Compile time dropped from 3 hours to 7 minutes.

查看更多
牵手、夕阳
4楼-- · 2019-01-01 08:09

C++ is compiled into machine code. So you have the pre-processor, the compiler, the optimizer, and finally the assembler, all of which have to run.

Java and C# are compiled into byte-code/IL, and the Java virtual machine/.NET Framework execute (or JIT compile into machine code) prior to execution.

Python is an interpreted language that is also compiled into byte-code.

I'm sure there are other reasons for this as well, but in general, not having to compile to native machine language saves time.

查看更多
梦寄多情
5楼-- · 2019-01-01 08:09

Several reasons

Header files

Every single compilation unit requires hundreds or even thousands of headers to be (1) loaded and (2) compiled. Every one of them typically has to be recompiled for every compilation unit, because the preprocessor ensures that the result of compiling a header might vary between every compilation unit. (A macro may be defined in one compilation unit which changes the content of the header).

This is probably the main reason, as it requires huge amounts of code to be compiled for every compilation unit, and additionally, every header has to be compiled multiple times (once for every compilation unit that includes it).

Linking

Once compiled, all the object files have to be linked together. This is basically a monolithic process that can't very well be parallelized, and has to process your entire project.

Parsing

The syntax is extremely complicated to parse, depends heavily on context, and is very hard to disambiguate. This takes a lot of time.

Templates

In C#, List<T> is the only type that is compiled, no matter how many instantiations of List you have in your program. In C++, vector<int> is a completely separate type from vector<float>, and each one will have to be compiled separately.

Add to this that templates make up a full Turing-complete "sub-language" that the compiler has to interpret, and this can become ridiculously complicated. Even relatively simple template metaprogramming code can define recursive templates that create dozens and dozens of template instantiations. Templates may also result in extremely complex types, with ridiculously long names, adding a lot of extra work to the linker. (It has to compare a lot of symbol names, and if these names can grow into many thousand characters, that can become fairly expensive).

And of course, they exacerbate the problems with header files, because templates generally have to be defined in headers, which means far more code has to be parsed and compiled for every compilation unit. In plain C code, a header typically only contains forward declarations, but very little actual code. In C++, it is not uncommon for almost all the code to reside in header files.

Optimization

C++ allows for some very dramatic optimizations. C# or Java don't allow classes to be completely eliminated (they have to be there for reflection purposes), but even a simple C++ template metaprogram can easily generate dozens or hundreds of classes, all of which are inlined and eliminated again in the optimization phase.

Moreover, a C++ program must be fully optimized by the compiler. A C# program can rely on the JIT compiler to perform additional optimizations at load-time, C++ doesn't get any such "second chances". What the compiler generates is as optimized as it's going to get.

Machine

C++ is compiled to machine code which may be somewhat more complicated than the bytecode Java or .NET use (especially in the case of x86). (This is mentioned out of completeness only because it was mentioned in comments and such. In practice, this step is unlikely to take more than a tiny fraction of the total compilation time).

Conclusion

Most of these factors are shared by C code, which actually compiles fairly efficiently. The parsing step is a lot more complicated in C++, and can take up significantly more time, but the main offender is probably templates. They're useful, and make C++ a far more powerful language, but they also take their toll in terms of compilation speed.

查看更多
只靠听说
6楼-- · 2019-01-01 08:11

The biggest issues are:

1) The infinite header reparsing. Already mentioned. Mitigations (like #pragma once) usually only work per compilation unit, not per build.

2) The fact that the toolchain is often separated into multiple binaries (make, preprocessor, compiler, assembler, archiver, impdef, linker, and dlltool in extreme cases) that all have to reinitialize and reload all state all the time for each invocation (compiler, assembler) or every couple of files (archiver, linker, and dlltool).

See also this discussion on comp.compilers: http://compilers.iecc.com/comparch/article/03-11-078 specially this one:

http://compilers.iecc.com/comparch/article/02-07-128

Note that John, the moderator of comp.compilers seems to agree, and that this means it should be possible to achieve similar speeds for C too, if one integrates the toolchain fully and implements precompiled headers. Many commercial C compilers do this to some degree.

Note that the Unix model of factoring everything out to a separate binary is a kind of the worst case model for Windows (with its slow process creation). It is very noticable when comparing GCC build times between Windows and *nix, especially if the make/configure system also calls some programs just to obtain information.

查看更多
若你有天会懂
7楼-- · 2019-01-01 08:12

The slowdown is not necessarily the same with any compiler.

I haven't used Delphi or Kylix but back in the MS-DOS days, a Turbo Pascal program would compile almost instantaneously, while the equivalent Turbo C++ program would just crawl.

The two main differences were a very strong module system and a syntax that allowed single-pass compilation.

It's certainly possible that compilation speed just hasn't been a priority for C++ compiler developers, but there are also some inherent complications in the C/C++ syntax that make it more difficult to process. (I'm not an expert on C, but Walter Bright is, and after building various commercial C/C++ compilers, he created the D language. One of his changes was to enforce a context-free grammar to make the language easier to parse.)

Also, you'll notice that generally Makefiles are set up so that every file is compiled separately in C, so if 10 source files all use the same include file, that include file is processed 10 times.

查看更多
登录 后发表回答