I have a C++ application, running on Linux, which I'm in the process of optimizing. How can I pinpoint which areas of my code are running slowly?
相关问题
- Sorting 3 numbers without branching [closed]
- How to compile C++ code in GDB?
- Why does const allow implicit conversion of refere
- thread_local variables initialization
- What uses more memory in c++? An 2 ints or 2 funct
Use
-pg
flag when compiling and linking the code and run the executable file. While this program is executed, profiling data is collected in the file a.out.There is two different type of profiling
1- Flat profiling:
by running the command
gprog --flat-profile a.out
you got the following data- what percentage of the overall time was spent for the function,
- how many seconds were spent in a function—including and excluding calls to sub-functions,
- the number of calls,
- the average time per call.
2- graph profiling
us the command
gprof --graph a.out
to get the following data for each function which includes- In each section, one function is marked with an index number.
- Above function , there is a list of functions that call the function .
- Below function , there is a list of functions that are called by the function .
To get more info you can look in https://sourceware.org/binutils/docs-2.32/gprof/
As no one mentioned Arm MAP, I'd add it as personally I have successfully used Map to profile a C++ scientific program.
Arm MAP is the profiler for parallel, multithreaded or single threaded C, C++, Fortran and F90 codes. It provides in-depth analysis and bottleneck pinpointing to the source line. Unlike most profilers, it's designed to be able to profile pthreads, OpenMP or MPI for parallel and threaded code.
MAP is commercial software.
Newer kernels (e.g. the latest Ubuntu kernels) come with the new 'perf' tools (
apt-get install linux-tools
) AKA perf_events.These come with classic sampling profilers (man-page) as well as the awesome timechart!
The important thing is that these tools can be system profiling and not just process profiling - they can show the interaction between threads, processes and the kernel and let you understand the scheduling and I/O dependencies between processes.
For single-threaded programs you can use igprof, The Ignominous Profiler: https://igprof.org/ .
It is a sampling profiler, along the lines of the... long... answer by Mike Dunlavey, which will gift wrap the results in a browsable call stack tree, annotated with the time or memory spent in each function, either cumulative or per-function.
Actually a bit surprised not many mentioned about google/benchmark , while it is a bit cumbersome to pin the specific area of code, specially if the code base is a little big one, however I found this really helpful when used in combination with
callgrind
IMHO identifying the piece that is causing bottleneck is the key here. I'd however try and answer the following questions first and choose tool based on that
valgrind
with the combination ofcallrind
andkcachegrind
should provide a decent estimation on the points above and once it's established that there are issues with some section of code, I'd suggest do a micro bench markgoogle benchmark
is a good place to start.