Say I have different sets (they have to be different, I cannot join them as per the kind of data I am working with):
r = set([1,2,3])
s = set([4,5,6])
t = set([7,8,9])
What is the best way to check if a given variable is present in any of them?
I am using:
if myvar in r \
or myvar in s \
or myvar in t:
But I wonder if this can be reduced somehow by using set
's properties such as union
.
The following works, but I can't find a way to define multiple unions:
if myvar in r.union(s)
or myvar in t:
I am also wondering if this union will somehow affect performance, since I guess a temporary set
will be created on the fly.
You can use builtin any:
any
will short circuit on the first condition that returnsTrue
so you can get around constructing a potentially hugeunion
or checking potentially lots of sets for inclusion.And I am also wondering if this union will affect somehow performance, since I guess a temporary set will be created on the fly.
According to wiki.python.com
s|t
isO(len(s)+len(t))
while lookups areO(1)
.For
n
sets withl
elements each , doingunion
iteratively to construct the set will result in:Which is equivalent to
O(l+l)
fora.union(b)
andO(2l+2l+l)
a.union(b).union(c)
and so on which sums up toO(n*(n+1)/2)*l)
.O(n^2*l)
is quadratic and voids the performance advantage of using sets.The lookup in n sets with
any
will perform atO(n)
Just use any:
set lookups are
0(1)
so creating a union to check if the variable is in any set is totally unnecessary instead of simply checking usingin
withany
which will short circuit as soon as a match is found and does not create a new set.And I am also wondering if this union will affect somehow performance
Yes of course unioning the sets affects performance, it adds to the complexity, you are creating a new set every time which is
O(len(r)+len(s)+len(t))
so you can say goodbye to the real point of using sets which are efficient lookups.So the bottom line is that is you want to keep efficient lookups you will have to union the set once and keep them in memory creating a new variable then using that to do your lookup for
myvar
so the initial creation will be0(n)
and lookups will be0(1)
thereafter.If you don't every time you want to do a lookup first creating the union you will have a linear solution in the length of
r+s+t -> set.union(*(r, s, t))
as opposed to at worst three constant(on average) lookups. That also means always adding or removing any elements from the new unioned set that are added/removed fromr,s
ort
.Some realistic timings on moderately large sized sets show exactly the difference:
Timing the union shows that pretty much all the time is spent in the union calls:
Using larger sets and getting the element in the last set:
There is literally no difference no matter how large the sets get using
any
but as the set sizes grow so does the running time using union.The only way to make it faster would be to stick to
or
but we are taking the difference of a few hundred nanoseconds which is the cost of creating the generator expression and the function call:To union sets set.union(*(r, s, t)) is also the fastest as you don't build intermediary sets:
You can simply do
if myvar in r.union(s).union(t)
And you needn't worry about performance here. Yes it creates a temporary set on the fly but as it isn't stored gets garbage collected.
|
is a union operator ofsets
in python. You can define union over multiple sets using|
as:You can use
reduce
function to apply function of two arguments cumulatively to the items of iterable:And for checking the membership in either of them you can use a generator expression within
any
that is more efficient here because python use hash table for storing the sets and checking the member ship has O(1) in such data structures like dictionaries orfrozenset
.Also for check the membership in all of you sets useall
.But in this case (Not for large sets) using
or
operator is faster.This is a benchmark on all the ways :
Note that as @Padraic Cunningham mentioned for large sets using a
any
is very much efficient!