C++ iterators & loop optimization

2020-05-11 12:05发布

站内文章 / C++

19 0

爷、活的狠高调

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I see a lot of c++ code that looks like this:

for( const_iterator it = list.begin(),
     const_iterator ite = list.end();
     it != ite; ++it)

As opposed to the more concise version:

for( const_iterator it = list.begin();
     it != list.end(); ++it)

Will there be any difference in speed between these two conventions? Naively the first will be slightly faster since list.end() is only called once. But since the iterator is const, it seems like the compiler will pull this test out of the loop, generating equivalent assembly for both.

回答1:

I'll just mention for the record that the C++ standard mandates that calling begin() and end() on any container type (be it vector, list, map etc.) must take only constant time. In practice, these calls will almost certainly be inlined to a single pointer comparison if you compile with optimisations turned on.

Note that this guarantee does not necessarily hold for additional vendor-supplied "containers" that do not actually obey the formal requirements of being a container laid out in the chapter 23 of the standard (e.g. the singly-linked list slist).

回答2:

The two versions are not the same though. In the second version it compares the iterator against list.end() every time, and what list.end() evaluates to might change over the course of the loop. Now of course, you cannot modify list through the const_iterator it; but nothing prevents code inside the loop from just calling methods on list directly and mutating it, which could (depending on what kind of data structure list is) change the end iterator. It might therefore be incorrect in some circumstances to store the end iterator beforehand, because that may no longer be the correct end iterator by the time you get to it.

回答3:

The first one will probably almost always be faster, but if you think this will make a difference, always profile first to see which is faster and by how much.

The compiler will probably be able to inline the call to end() in both cases, although if end() is sufficiently complicated, it may opt not to inline it. However, the key optimization is whether or not the compiler can perform loop-invariant code motion. I would posit that in most cases, the compiler can't be certain that the value of end() won't change during the iteration of the loop, in which case it has no choice but to call end() after each iteration.

回答4:

I would choose the option that is most concise and readable. Don't try to second guess the compiler and the optimisations it might perform. Remember that the vast majority of your code will have absolutely no effect on overall performance, so only if this is in a performance critical section of code should you spend the time to profile it and choose an appropriately efficient source representation.

With specific reference to your example, the first version makes a copy of the end() iterator, invoking whatever code runs for the copy constructor of the iterator object. STL containers generally contain inline end() functions, so the compiler has plenty of opportunity to optimise the second version even if you're not trying to help it out. Which one is best? Measure them.

回答5:

You can make the first version more concise and get the best of both:

for( const_iterator it = list.begin(), ite = list.end();
     it != ite; ++it)

P.S. The iterators aren't const, they're iterators to a const reference. There's a big difference.

回答6:

Consider this example:

for (const_iterator it = list.begin(); it != list.end(); ++list)
{
    if (moonFull())
        it = insert_stuff(list);
    else
        it = erase_stuff(list);
}

in this case, you NEED to call list.end() in the loop, and the compiler isn't going to optimize that away.

Other cases where the compiler can prove that end() is always returning the same value, optimization can take place.

If we are talking about STL containers, than i think any good compiler can optimize away multiple end() calls when multiple end() calls is not needed for the programming logic. However, if you have a custom container and implementation of end() is not in the same translation unit, than optimization will have to happen at link time. I know very little about link time optimization, but i'll bet most linkers will not do such optimization.

回答7:

Aah, people seem to be making guesses. Open up your code in the debugger & you will see that the calls to begin(), end() etc everything is optimized away. There is no need to use version 1. Tested with Visual C++ compiler fullopt.

回答8:

The compiler might be able to optimize the second into the first, but that assumes that the two are equivalent, i.e. end() actually is constant. A slightly more problematic issue is that the compiler may be unable to deduce that the end iterator is constant due to possible aliasing. However, assuming that the call to end() is inlined, the difference is just a memory load.

Note that this assumes that the optimizer is enabled. If the optimizer is not enabled, as is often done in debug builds, then the second formulation will involve N-1 more function calls. In current versions of Visual C++, debug builds will also incur additional hits due to function prolog/epilog checks and heavier debug iterators. Therefore, in STL heavy code, defaulting to the first case can prevent code from being disproportionately slow in debug builds.

Insertion and removal within the loop are a possibility, as others have pointed out, but with this style of loop I find that unlikely. For one thing, node-based containers -- list, set, map -- don't invalidate end() on either operation. Second, the iterator increment frequently has to be moved in-loop to avoid invalidation problems:

   // assuming list -- cannot cache end() for vector
   iterator it(c.begin()), end(c.end());
   while(it != end) {
       if (should_remove(*it))
           it = c.erase(it);
       else
           ++it;
   }

Thus, I consider a loop that claims to call end() for mutate-during-loop reasons and still has ++it in the loop header to be suspect.

回答9:

Sample it under stress conditions and see if you're in ** this code very often ***.
If not, it doesn't matter.
If you are, look at the disassembly, or single-step it.
That's how you can tell which is faster.

You gotta be careful of these iterators.
They may get optimized into nice machine code, but often enough they don't, and become time hogs.

** (Where "in" means actually in it, or being called from it.)

*** (Where "often" means a significant percentage of the time.)

ADDED: Don't just see how many times per second the code is executed. It could be 1,000 times a second and still be using less than 1% of the time.

Don't time how long it takes either. It could take a millisecond and still be using less than 1% of the time.

You could multiply the two, to get a better idea, but that only works if they're not too skewed.

Sampling the call stack will tell you if it uses a high enough percentage of time to matter.

回答10:

I always preferred the first one. Though with inline functions, compiler optimizations and relatively smaller container size ( in my case it's normally max 20-25 items) it really does not make any large difference with respect to performace.

const_iterator it = list.begin();
const_iterator endIt = list.end();

for(; it != endIt ; ++it)
{//do something
}

But recently I am using more of std::for_each wherever it's possible. Its optimized looping which helps to make the code look more readable than other two.

std::for_each(list.begin(), list.end(), Functor());

I will use the loop only when std::for_each cannot be used. (for ex: std::for_each does not allow you to break the loop unless an exception is thrown).

回答11:

In theory the compiler could optimise the second version into the first (assuming the container doesn't change during the loop, obviously).

In practice, I've found several similar cases when profiling time-critical code where my compiler has totally failed to hoist invariant calculations out of loop conditions. So while the slightly more concise version is fine in most cases, I don't rely on the compiler doing sensible things with it for a case where I'm really concerned about performance.