Python dictionary iterator performance

2020-04-05 09:01发布

问题:

When working with dictionaries in Python, this page says that the time complexity of iterating through the element of the dictionary is O(n), where n is the largest size the dictionary has been.

However, I don't think that there is an obvious way to iterate through the elements of a hash table. Can I assume good performance of dict.iteritems() when iterating through element of a hash table, without too much overhead?

Since dictionaries are used a lot in Python, I assume that this is implemented in some smart way. Still, I need to make sure.

回答1:

If you look at the notes on Python's dictionary source code, I think the relevant points are the following:

Those methods (iteration and key listing) loop over every potential entry

How many potential entries will there be, as a function of largest size (largest number of keys ever stored in that dictionary)? Look at the following two sections in the same document:

Maximum dictionary load in PyDict_SetItem. Currently set to 2/3

Growth rate upon hitting maximum load. Currently set to *2.

This would suggest that the sparsity of a dictionary is going to be somewhere around 1/3~2/3 (unless growth rate is set to *4, then it's 1/6~2/3). So basically you're going to be checking upto 3 (or 6 if *4) potential entries for every key.

Of course, whether it's 3 entries or 1000, it's still O(n) but 3 seems like a pretty acceptable constant factor.

Incidentally here are the rest of the source & documentation, including that of the DictObject:

http://svn.python.org/projects/python/trunk/Objects/