Something I've noticed when testing code I write is that long-running operations tend to run much longer the first time a program is run than on subsequent runs, sometimes by a factor of 10 or more. Obviously there's some sort of cold cache/warm cache issue here, but I can't seem to figure out what it is.
It's not the CPU cache, since these long-running operations tend to be loops that I feed a lot of data to, and they should be fully loaded after the first iteration. (Plus, unloading and reloading the program should clear the cache.)
Also, it's not the disc cache. I've ruled that out by loading all data from disc up-front and processing it afterwards, and it's the actual CPU-bound data processing that's going slowly.
So what can cause my program to run slow the first time I run it, but then if I close it and run it again, it runs dramatically faster? I've seen this in several different programs that do very different things, so it seems to be a general issue.
EDIT: For clarification, I'm writing in Delphi, though I don't really think this is a Delphi-specific issue. But that means that whatever the problem is, it's not related to JIT issues, garbage collection issues, or any of the other baggage that managed code brings with it. And I'm not dealing with network connections. This is pure CPU-bound processing.
One example: a script compiler. It runs like this:
- Load entire file into memory from disc
- Lex the entire file into a queue of tokens
- Parse the queue into a tree
- Run codegen on the tree to produce bytecode
If I feed it an enormous script file (~100k lines,) after loading the entire thing from disc into memory, the lex step takes about 15 seconds the first time I run, and 2 seconds on subsequent runs. (And yes, I know that's still a long time. I'm working on that...) I'd like to know where that slowdown is coming from and what I can do about it.
Other factors I can think of would be memory-alignment (and the subsequent cache line fills), but say there are 2 types : perfect alignment (being fastest) and imperfect (being slower), one would expect it to occur irregularly (depending on how memory is laid out).
Perhaps it has something to do with physical page layout? As far as I know, each memory-access goes through the MMU page table entries, so dispersed physical pages could be slower than consecutive pages. (Just a wild guess, this one)
Another thing I haven't seen mentioned yet, is on which core(s) your process is running - especially on hyper-threaded CPU's, running on the slower of the two cores might have a negative impact. Try setting the processor affinity mask on one and the same core for every run, and see if that impacts the measured runtime differences between first and subsequent runs.
By the way - how do you define 'first run'? Could it be that you've just compiled the executable? In that case (and I'm just guessing again here), some process (either the OS, a virus-scanner, or even some root-kit) might be busy analyzing your executable's behaviour, which might be skipped once the executable has been analyzed before. You could try to prove that by changing some random unimportant byte of your executable between runs, and see if that impacts the runtime negatively again?
Please post a summary once you figured out the cause(s) - this might help others too. Cheers!
Guessing your using .net if im wrong you could ignore most of my ideas...
Connection pooling, JIT compilation, reflection, IO Caching the list goes on and on....
Try testing smaller portions of the code to see what parts change performance the most...
You could try ngen'ing your assemblies as this removes the JIT compilation.
There are lots of things that can cause this. Just as one example: if you're using
ADO.NET
for data access with connection pooling turned on (which is the default), the first time your application runs it will take the hit of creating the database connection. When your app is closed, the connection is maintained in its open state byADO.NET
, so the next time your app runs and does data access it will not have to take the hit of instantiating the connection, and thus will appear faster.