Consider the following code (p
is of type unsigned char*
and bitmap->width
is of some integer type, exactly which is unknown and depends on which version of some external library we're using):
for (unsigned x = 0; x < static_cast<unsigned>(bitmap->width); ++x)
{
*p++ = 0xAA;
*p++ = 0xBB;
*p++ = 0xCC;
}
Is it worth optimizing it [..]
Could there be a case where this could yield more efficient results by writing:
unsigned width(static_cast<unsigned>(bitmap->width));
for (unsigned x = 0; x < width; ++x)
{
*p++ = 0xAA;
*p++ = 0xBB;
*p++ = 0xCC;
}
... or is this trivial for the compiler to optimize?
What would you consider to be "better" code?
Note from editor (Ike): for those wondering about the strikeout text, the original question, as phrased, was dangerously close to off-topic territory and was very close to being closed in spite of positive feedback. These have been stricken out. Yet please do not punish the answerers who addressed these stricken sections of the question.
At first glance, I thought the compiler could generate equivalent assembly for both versions with optimization flags activated. When I checked it, I was surprised to see the result:
Source
unoptimized.cpp
note: this code is not meant to be executed.
Source
optimized.cpp
note: this code is not meant to be executed.
Compilation
$ g++ -s -O3 unoptimized.cpp
$ g++ -s -O3 optimized.cpp
Assembly (unoptimized.s)
Assembly (optimized.s)
diff
The generated assembly for the optimized version does actually load (
lea
) thewidth
constant unlike the unoptimized version which computes thewidth
offset at each iteration (movq
).When I'll get time, I eventually post some benchmark on that. Good question.
There is actually insufficient information from your code snippet to be able to tell, and the one thing that I can think of is aliasing. From our point of view, it's pretty clear that you don't want
p
andbitmap
to point to the same location in memory, but the compiler doesn't know that and (becausep
is of typechar*
) the compiler has to make this code work even ifp
andbitmap
overlap.This means in this case that if the loop changes
bitmap->width
through the pointerp
then that has to be seen when re-readingbitmap->width
later on, which in turn means that storing it in a local variable would be illegal.That being said, I believe some compilers will actually sometimes generate two versions of the same code (I have seen circumstantial evidence of this, but never directly sought out information on what the compiler is doing in this case), and quickly check if the pointers alias and run the faster code if it determines it's okay to.
That being said, I stand by my comment about simply measuring the performance of the two versions, my money is on not seeing any consistent performance difference between the two versions of the code.
In my opinion, questions like these are okay if your purpose is to learn about compiler optimization theories and techniques, but is a waste of time (a useless micro-optimization) if your end goal here is to make the program run faster.
The only thing here that can prevent the optimization is the strict aliasing rule. In short:
The exception also applies to
unsigned
andsigned
char
pointers.This is the case in your code: You're modifying
*p
throughp
which is anunsigned char*
, so the compiler must assume that it could point tobitmap->width
. Hence the caching ofbitmap->width
is an invalid optimization. This optimization-preventing behavior is shown in YSC's answer.If and only if
p
pointed to a non-char
and non-decltype(bitmap->width)
type, would the caching be a possible optimization.The compiler is able to optimize a lot of things. For your example, you should go for the readability, mantainability and what follows your code standard. For more information about what can be optimized (with GCC), see this blog post.
The question originally asked:
And my answer to that (garnering a good mix of both up and down votes..)
Despite the downvotes (and now seeing the aliasing issue), I'm still happy with that as a valid answer. If you don't know if it's worth optimizing something, it probably isn't.
A rather different question, of course, would be this:
First, does your application or library need to run faster than it currently does? Is the user kept waiting too long? Does your software forecast yesterday's weather instead of tomorrow's?
Only you can really tell this, based on what your software is for and what your users expect.
Assuming your software does need some optimzation, the next thing to do is start measuring. Profilers will tell you where your code spends it's time. If your fragment isn't showing as a bottleneck, it's best left alone. Profilers and other measuring tools will also tell you if your changes have made a difference. It's possible to spend hours attemtping to optimize code, only to find you've made no discernible difference.
If you're not writing 'optimized' code, than your code should be as clear, clean and concise as you can make it. The "Premature optimization is evil" argument isn't an excuse for sloppy or inefficient code.
Optimized code normally sacrifices some of the attributes above for performance. It could involve introducing additional local variables, having objects with wider than expected scope or even reversing normal loop ordering. All of these may be less clear or concise, so document the code (briefly!) about why you're doing this.
But often, with 'slow' code, these micro-optimizations are the last resort. The first place to look is at algorithms and data structures. Is there a way of avoiding doing the work at all? Can linear searches be replaced with binary ones? Would a linked list be faster here than a vector? Or a hash table? Can I cache results? Making good 'efficient' decisions here can often affect performance by an order of magnitude or more!
There are two things to consider.
A) How often will the optimization run?
If the answer is not very often, like only when a user clicks a button, then don't bother if it makes your code unreadable. If the answer is 1000 times a second then you will probably want to go with the optimization. If it is even a bit complex be sure to put a comment in to explain what is going on to help the next guy that comes along.
B) Will this make the code harder to upkeep/troubleshoot?
If you're not seeing a huge gain in performance then making your code cryptic simply to save a few clock ticks is not a good idea. Lots of people will tell you that any good programmer should be able to look at the code and figure out what is going on. This is true. The problem is that in the business world the extra time figuring that out costs money. So, if you can make it prettier to read then do it. Your friends will thank you for it.
That said I'd personally use the B example.