Somebody asked me a question: Which one is the fastest among the below two scenario:
Case 1: assume int count = 0;
for (int i = 0; i < 10; i++)
{
for (int j = 0; j < 5; j++)
{
count++;
}
}
Case 2: assume int count = 0;
for (int i = 0; i < 5; i++)
{
for (int j = 0; j < 10; j++)
{
count++;
}
}
In both the scenario's the final valuse of count will be 50. But I am not sure which one will be faster? I think CASE II is faster but not sure...
It will be great if anyone can throw some light on it. which is faster and why?
Actually, it is in general not a good idea to try to optimize yourself for such details, because mostly the compiler is much better at it (as long as the algorithm is not changed).
The number of loops is equal.
It might be that the second is a bit faster, because the initialization of the second loop only happens 5 times instead of 10, but I doubt that would really gain some noticable change.
The best way is to use a profiler or even better: analyze the generated assembly code.
Have you actually tried it?
I would assume theoretically Case II would be marginally faster since there would only be half of the stack-based variable creation / destruction in the outer loop than there would be for Case I, but even better (and clearer) would be:
To be precise, your examples are so abstracted that the answer is probably of little use. It very much depends on the context.
this is the only example i can think of where it matters which variable you iterate where
the second one is slower because you don't iterate the locations in memory one after the other, but because you jump by
m
locations at a time. the processor caches the memory locations positioned immediately after an accessed location.If you really want to know, make your outer loop a function with a single parameter of the maximum loop count (5 or 10), your inner loop a function which takes a single parameter of the maximum inner loop count (10 or 5), then compile without any optimisation, run both a million times, and time them.
With any optimisation at all, your compiler will inline the functions, and expand the loops, and calculate
count
as part of the optimisation process. My guess is that they'll do exactly the same amount of work, and you'll see exactly the same times.Okay, tested this on my system. With full optimization, the compiler just made count = 50, with no questions asked. Without optimization, the second version usually was the slightest bit faster, but it was completely negligible.
The disassembly: Both loops have the precisely same code, except the compares are once with 100, once with 50 (I buffed the numbers up a bit to allow for longer execution time)
The only difference between big loop outside, small loop inside, and small loop inside, and big loop outside is how often you have to do the jump from
And the initialization for the loop variables:
for every new loop, while skipping below part.
Additional commands with each iteration of the inner loop: mov, add, mov, but no mov / jmp
Additional commands for each inner loop initialized: mov, jmp, and more often getting the JGE true.
Thus if you run the inner loop 50 times, you will have that JGE only come true 50 times, and thus do 50 jumps there, while with the inner loop running 100 times, you will have to jump 100 times. That's the ONLY difference in the code. With this case it's hardly any difference, and most of the times you will run into your memory access being the thing causing a slowdown a LOT more than your loop ordering. Only exception: if you know you can order your loops properly to avoid branch prediction. So two things are worthy of ordering your loop one way or the other:
-memory access
-branch prediction
For everything else the impact is completely negligible.