I've seen other Python programmers use defaultdict from the collections module for the following use case:
from collections import defaultdict
s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
def main():
d = defaultdict(list)
for k, v in s:
d[k].append(v)
I've typically approached this problem by using setdefault instead:
def main():
d = {}
for k, v in s:
d.setdefault(k, []).append(v)
The docs do in fact claim that using defaultdict is faster, but I've seen the opposite to be true when testing myself:
$ python -mtimeit -s "from withsetdefault import main; s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)];" "main()"
100000 loops, best of 3: 4.51 usec per loop
$ python -mtimeit -s "from withdefaultdict import main; s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)];" "main()"
100000 loops, best of 3: 5.38 usec per loop
Is there something wrong with how I've set up the tests?
For reference, I'm using Python 2.7.3 [GCC 4.2.1 (Apple Inc. build 5666)
Yes, there is something "wrong":
You have put the creation of the
(default)dict
into the statement instead of the setup. Constructing a newdefaultdict
is more expensive than a normaldict
, and usually that's not the bottleneck you should be profiling in a program - after all, you build your data structures once but you use them many times.If you do your tests like below, you see that
defaultdict
operations are indeed faster:Python 2.7.3 on Win7 x64.