I was wondering if there was a clear/concise way to add something to a set and check if it was added without 2x hashes & lookups.
this is what you might do, but it has 2x hash's of item
if item not in some_set: # <-- hash & lookup
some_set.add(item) # <-- hash & lookup, to check the item already is in the set
other_task()
This works with a single hash and lookup but is a bit ugly.
some_set_len = len(some_set)
some_set.add(item)
if some_set_len != len(some_set):
other_task()
Is there a better way to do this using Python's set api?
I don't think there's a built-in way to do this. You could, of course, write your own function:
Or, if you prefer cryptic one-liners:
(This relies on the left-to-right evaluation order and on the fact that
set.add()
always returnsNone
, which is falsey.)All this aside, I would only consider doing this if the double hashing/lookup is demonstrably a performance bottleneck and if using a function is demonstrably faster.
Dictionaries have the nice setdefault function to avoid a whole class of problems related to the "double lookup" mentioned in the question. Since, in CPython at least, most of the set code is shared with dictionaries, I tried using that when working with a very large set (500k+ add, +/- 10% duplicates entries).
In addition, in order to reduce the overhead implied by the Python symbol name lookup, I wrapped that in a higher-order function so the compiler will build a closure and so will be able to use the index-based
LOAD_FAST
/LOAD_DEREF
opcodes instead of the more expensive name lookup basedLOAD_ATTR
/LOAD_GLOBAL
:In my particular use case, this solution runs more than 20% faster than the one suggested in the other answer. Of course, your mileage may vary, so you should run your own tests.
For reference, here are the disassembled code of both solution (Python3.5 running on Linux):