I have a larger C++ program which starts out by reading thousands of small text files into memory and storing data in stl containers. This takes about a minute. Periodically, a compilation will exhibit behavior where that initial part of the program will run at about 22-23% CPU load. Once that step is over, it goes back to ~100% CPU. It is more likely to happen with O2 flag turned on but not consistently. It happens even less often with the -p flag which makes it almost impossible to profile. I did capture it once but the gprof output wasn't helpful - everything runs with the same relative speed just at low cpu usage.
I am quite certain that this has nothing to do with multiple cores. I do have a quad-core cpu, and most of the code is multi-threaded, but I tested this issue running a single thread. Also, when I run the problematic step in multiple threads, each thread only runs at ~20% CPU.
I apologize ahead of time for the vagueness of the question but I have run out of ideas as to how to troubleshoot it further, so any hints might be helpful.
UPDATE: Just to make sure it's clear, the problematic part of the code does sometimes (~30-40% of the compilations) run at 100% CPU, so it's hard to buy the (otherwise reasonable) argument that I/O is the bottleneck
It's the buffer cache
My guess is that you are seeing the results of the Linux buffer cache in operation.
Those thousands of files will take a long time to read in from the disk and the CPU will mostly be waiting on rotational and seek latencies. Reported CPU-time used will be low expressed as a percentage. (But probably greater overall.)
But once read, those small files are completely cached in memory and accessing each file (in subsequent runs) becomes a purely CPU-bound activity.
Whether the blocks remain in the cache depends on intervening activity, such as recompiles. When new programs are run and other files are read, the programs and the files will be cached and old blocks will be dropped, and obviously, memory-intensive workload will also clear out the buffer cache.
Since you're reading a ton of small files, your program is blocked waiting on disk I/O for the majority of the time. Since the CPU isn't busy while it's waiting for the disk to ship the data to it, you're seeing a load of significantly less than 100%. Once that's over, now you're CPU-bound, and your program will eat all available CPU time.
The fact that it works faster sometimes is because (as Jarryd & DigitalRoss mention) once you've read them into system memory, they're in the OS's cache, so subsequent loads will be an order of magnitude faster, unless they've been evicted by other disk I/O. So if you run the program back-to-back, the 2nd run will probably be much faster. If you wait a while (and do other stuff in the meantime), there may have been enough other disk I/O to evict those files from the cache, in which case it will take a long time to read them again.
In addition to other answers mentionning the buffer cache, if you want to understand what is going on during a compilation, you could pass some of the below flags to GCC (i.e. to g++
, probably as a CXXFLAGS
setting in your Makefile
):
-v
to ask g++
to show the involved subprocesses (e.g. cc1plus
for the proper C++ compiler)
-time
to ask g++
to report the time of each sub-process
-ftime-report
to ask g++
(actually cc1plus
) to report the time of internal phases or passes inside the compiler.