STL map insertion efficiency: [] vs. insert

2019-03-24 22:37发布

问题:

There are two ways of map insertion:

m[key] = val;

Or

m.insert(make_pair(key, val));

My question is, which operation is faster? People usually say the first one is slower, because the STL Standard at first 'insert' a default element if 'key' is not existing in map and then assign 'val' to the default element.

But I don't see the second way is better because of 'make_pair'. make_pair actually is a convenient way to make 'pair' compared to pair<T1, T2>(key, val). Anyway, both of them do two assignments, one is assigning 'key' to 'pair.first' and two is assigning 'val' to 'pair.second'. After pair is made, map inserts the element initialized by 'pair.second'.

So the first way is 1. 'default construct of typeof(val)' 2. assignment the second way is 1. assignment 2. 'copy construct of typeof(val)'

回答1:

Both accomplish different things.

m[key] = val;

Will insert a new key-value pair if the key doesn't exist already, or it will overwrite the old value mapped to the key if it already exists.

m.insert(make_pair(key, val));

Will only insert the pair if key doesn't exist yet, it will never overwrite the old value. So, choose accordingly to what you want to accomplish.
For the question what is more efficient: profile. :P Probably the first way I'd say though. The assignment (aka copy) is the case for both ways, so the only difference lies in construction. As we all know and should implement, a default construction should basically be a no-op, and thus be very efficient. A copy is exactly that - a copy. So in way one we get a "no-op" and a copy, and in way two we get two copies.
Edit: In the end, trust what your profiling tells you. My analysis was off like @Matthieu mentions in his comment, but that was my guessing. :)


Then, we have C++0x coming, and the double-copy on the second way will be naught, as the pair can simply be moved now. So in the end, I think it falls back on my first point: Use the right way to accomplish the thing you want to do.



回答2:

On a lightly loaded system with plenty of memory, this code:

#include <map>
#include <iostream>
#include <ctime>
#include <string>

using namespace std;

typedef map <unsigned int,string> MapType;
const unsigned int NINSERTS = 1000000;

int main() {
    MapType m1;
    string s = "foobar";
    clock_t t = clock();
    for ( unsigned int i = 0; i < NINSERTS; i++ ) {
        m1[i] = s;
    }
    cout << clock() - t << endl;
    MapType m2;
    t = clock();
    for ( unsigned int i = 0; i < NINSERTS; i++ ) {
        m2.insert( make_pair( i, s ) );
    }
    cout << clock() - t << endl;
}

produces:

1547
1453

or similar values on repeated runs. So insert is (in this case) marginally faster.



回答3:

Performance wise I think they are mostly the same in general. There may be some exceptions for a map with large objects, in which case you should use [] or perhaps emplace which creates fewer temporary objects than 'insert'. See the discussion here for details.

You can, however, get a performance bump in special cases if you use the 'hint' function on the insert operator. So, looking at option 2 from here:

iterator insert (const_iterator position, const value_type& val);

the 'insert' operation can be reduced to constant time (from log(n)) if you give a good hint (often the case if you know you are adding things at the back of your map).



回答4:

We have to refine the analysis by mentioning that the relative performance depends on the type(size) of the objects being copied as well.

I did a similar experiment (to nbt) with a map of (int -> set). I know it is a terrible thing to do, but, illustrative for this scenario. The "value", in this case a set of ints, has 20 elements in it.

I execute a million iterations of the []= Vs. insert operations and do RDTSC/iter-count.

[] = set | 10731 cycles

insert(make_pair<>) | 26100 cycles

It shows the magnitude of penalty added due to the copying. Of course, CPP11(move ctor's) will change the picture.



回答5:

My take on it: Worth reminding that maps is a balanced binary tree, most of the modifications and checks take O(logN).

Depends really on the problem you are trying to solve.

1) if you just want to insert the value knowing that it is not there yet, then [] would do two things: a) check if the item is there or not b) if it is not there will create pair and do what insert does ( double work of O( logN ) ), so I would use insert.

2) if you are not sure if it is there or not, then a) if you did check if the item is there by doing something like if( map.find( item ) == mp.end() ) couple of lines above somewhere, then use insert, because of double work [] would perform b) if you didn't check, then it depends, cause insert won't modify the value if it is there, [] will, otherwise they are equal