The following two expressions seem equivalent to me. Which one is preferable?
data = [('a', 1), ('b', 1), ('b', 2)]
d1 = {}
d2 = {}
for key, val in data:
# variant 1)
d1[key] = d1.get(key, []) + [val]
# variant 2)
d2.setdefault(key, []).append(val)
The results are the same but which version is better or rather more pythonic?
Personally I find version 2 harder to understand, as to me setdefault is very tricky to grasp. If I understand correctly, it looks for the value of "key" in the dictionary, if not available, enters "[]" into the dict, returns a reference to either the value or "[]" and appends "val" to that reference. While certainly smooth it is not intuitive in the least (at least to me).
To my mind, version 1 is easier to understand (if available, get the value for "key", if not, get "[]", then join with a list made up from [val] and place the result in "key"). But while more intuitive to understand, I fear this version is less performant, with all this list creating. Another disadvantage is that "d1" occurs twice in the expression which is rather error-prone. Probably there is a better implementation using get, but presently it eludes me.
My guess is that version 2, although more difficult to grasp for the inexperienced, is faster and therefore preferable. Opinions?
The logic of
dict.get
is:Take an example:
The mechamism of
setdefault
is:The setdefault dict method is for precisely this purpose. The preceding for loop can be rewritten as:
It's very simple, means that either a non-null list append an element or a null list append an element.
The
defaultdict
, which makes this even easier. To create one, you pass a type or function for generating the default value for each slot in the dict:Your two examples do the same thing, but that doesn't mean
get
andsetdefault
do.The difference between the two is basically manually setting
d[key]
to point to the list every time, versussetdefault
automatically settingd[key]
to the list only when it's unset.Making the two methods as similar as possible, I ran
and got
So
setdefault
is around 10% faster thanget
for this purpose.The
get
method allows you to do less than you can withsetdefault
. You can use it to avoid getting aKeyError
when the key doesn't exist (if that's something that's going to happen frequently) even if you don't want to set the key.See Use cases for the 'setdefault' dict method and dict.get() method returns a pointer for some more info about the two methods.
The thread about
setdefault
concludes that most of the time, you want to use adefaultdict
. The thread aboutget
concludes that it is slow, and often you're better off (speed wise) doing a double lookup, using a defaultdict, or handling the error (depending on the size of the dictionary and your use case).1. Explained with a good example here:
http://code.activestate.com/recipes/66516-add-an-entry-to-a-dictionary-unless-the-entry-is-a/
2. More explanation : http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html
dict.setdefault()
is equivalent toget
orset & get
. Orset if necessary then get
. It's especially efficient if your dictionary key is expensive to compute or long to type.The only problem with dict.setdefault() is that the default value is always evaluated, whether needed or not. That only matters if the default value is expensive to compute. In that case, use defaultdict.
3. Finally the official docs with difference highlighted http://docs.python.org/2/library/stdtypes.html
You might want to look at
defaultdict
in thecollections
module. The following is equivalent to your examples.There's more here.
The accepted answer from agf isn't comparing like with like. After:
d[0]
contains a list with 10,000 items whereas after:d[0]
is simply[]
. i.e. thed.setdefault
version never modifies the list stored ind
. The code should actually be:and in fact is faster than the faulty
setdefault
example.The difference here really is because of when you append using concatenation the whole list is copied every time (and once you have 10,000 elements that is beginning to become measurable. Using
append
the list updates are amortised O(1), i.e. effectively constant time.Finally, there are two other options not considered in the original question:
defaultdict
or simply testing the dictionary to see whether it already contains the key.So, assuming
d3, d4 = defaultdict(list), {}
variant 1 is by far the slowest because it copies the list every time, variant 2 is the second slowest, variant 3 is the fastest but won't work if you need Python older than 2.5, and variant 4 is just slightly slower than variant 3.
I would say use variant 3 if you can, with variant 4 as an option for those occasional places where
defaultdict
isn't an exact fit. Avoid both of your original variants.For those who are still struggling in understanding these two term, let me tell you basic difference between get() and setdefault() method -
Scenario-1
Scenario-2
In Scenario-1 output will be
{'A': []}
while in Scenario-2{}
So
setdefault()
sets absent keys in the dict whileget()
only provides you default value but it does not modify the dictionary.Now let come where this will be useful- Suppose you are searching an element in a dict whose value is a list and you want to modify that list if found otherwise create a new key with that list.
using
setdefault()
using
get()
Now lets examine timings -
Took 288 ns
Took 128 s
So there is a very large timing difference between these two approaches.