I am focusing on the CPU/memory consumption of compiled programs by GCC.
Executing code compiled with O3 is it always so greedy in term of resources ?
Is there any scientific reference or specification that shows the difference of Mem/cpu consumption of different levels?
People working on this problem often focus on the impact of these optimizations on the execution time, compiled code size, energy. However, I can't find too much work talking about resource consumption (by enabling optimizations).
Thanks in advance.
No, there is no absolute way, because optimization in compilers is an art (and is even not well defined, and might be undecidable or intractable).
But some guidelines first:
be sure that your program is correct and has no bugs before optimizing anything, so do debug and test your program
have well designed test cases and representative benchmarks (see this).
be sure that your program has no undefined behavior (and this is tricky, see this), since GCC will optimize strangely (but very often correctly, according to C99 or C11 standards) if you have UB in your code; use the -fsanitize=
style options (and gdb
and valgrind ....) during debugging phase.
profile your code (on various benchmarks), in particular to find out what parts are worth optimization efforts; often (but not always) most of the CPU time happens in a small fraction of the code (rule of thumb: 80% of time spent in 20% of code; on some applications like the gcc
compiler this is not true, check with gcc -ftime-report
to ask gcc
to show time spent in various compiler modules).... Most of the time "premature optimization is the root of all evil" (but there are exceptions to this aphorism).
improve your source code (e.g. use carefully and correctly restrict
and const
, add some pragmas or function or variable attributes, perhaps use wisely some GCC builtins __builtin_expect
, __builtin_prefetch
-see this-, __builtin_unreachable
...)
use a recent compiler. Current version (october 2015) of GCC is 5.2 (and GCC 8 in june 2018) and continuous progress on optimization is made ; you might consider compiling GCC from its source code to have a recent version.
enable all warnings (gcc -Wall -Wextra
) in the compiler, and try hard to avoid all of them; some warnings may appear only when you ask for optimization (e.g. with -O2
)
Usually, compile with -O2 -march=native
(or perhaps -mtune=native
, I assume that you are not cross-compiling, if you do add the good -march
option ...) and benchmark your program with that
Consider link-time optimization by compiling and linking with -flto
and the same optimization flags. E.g., put CC= gcc -flto -O2 -march=native
in your Makefile
(then remove -O2 -mtune=native
from your CFLAGS
there)...
Try also -O3 -march=native
, usually (but not always, you might sometimes has slightly faster code with -O2
than with -O3
but this is uncommon) you might get a tiny improvement over -O2
If you want to optimize the generated program size, use -Os
instead of -O2
or -O3
; more generally, don't forget to read the section Options That Control Optimization of the documentation. I guess that both -O2
and -Os
would optimize the stack usage (which is very related to memory consumption). And some GCC optimizations are able to avoid malloc
(which is related to heap memory consumption).
you might consider profile-guided optimizations, -fprofile-generate
, -fprofile-use
, -fauto-profile
options
dive into the documentation of GCC, it has numerous optimization & code generation arguments (e.g. -ffast-math
, -Ofast
...) and parameters and you could spend months trying some more of them; beware that some of them are not strictly C standard conforming!
recent GCC and Clang can emit DWARF debug information (somehow "approximate" if strong optimizations have been applied) even when optimizing, so passing both -O2
and -g
could be worthwhile (you still would be able, with some pain, to use the gdb
debugger on optimized executable)
if you have a lot of time to spend (weeks or months), you might customize GCC using MELT (or some other plugin) to add your own new (application-specific) optimization passes; but this is difficult (you'll need to understand GCC internal representations and organization) and probably rarely worthwhile, except in very specific cases (those when you can justify spending months of your time for improving optimization)
you might want to understand the stack usage of your program, so use -fstack-usage
you might want to understand the emitted assembler code, use -S -fverbose-asm
in addition of optimization flags (and look into the produced .s
assembler file)
you might want to understand the internal working of GCC, use various -fdump-* flags (you'll get hundred of dump files!).
Of course the above todo list should be used in an iterative and agile fashion.
For memory leaks bugs, consider valgrind and several -fsanitize=
debugging options. Read also about garbage collection (and the GC handbook), notably Boehm's conservative garbage collector, and about compile-time garbage collection techniques.
Read about the MILEPOST project in GCC.
Consider also OpenMP, OpenCL, MPI, multi-threading, etc... Notice that parallelization is a difficult art.
Notice that even GCC developers are often unable to predict the effect (on CPU time of the produced binary) of such and such optimization. Somehow optimization is a black art.
Perhaps gcc-help@gcc.gnu.org
might be a good place to ask more specific & precise and focused questions about optimizations in GCC
You could also contact me on basile
atstarynkevitch
dotnet
with a more focused question... (and mention the URL of your original question)
For scientific papers on optimizations, you'll find lots of them. Start with ACM TOPLAS, ACM TACO etc... Search for iterative compiler optimization etc.... And define better what resources you want to optimize for (memory consumption means next to nothing....).