Trying to learn Python I encountered the following:
>>> set('spam') - set('ham')
set(['p', 's'])
Why is it set(['p', 's'])
- i mean: why is 'h'
missing?
Trying to learn Python I encountered the following:
>>> set('spam') - set('ham')
set(['p', 's'])
Why is it set(['p', 's'])
- i mean: why is 'h'
missing?
The -
operator on python sets is mapped to the difference
method, which is defined as the members of set A
which are not members of set B
. So in this case, the members of "spam"
which are not in "ham"
are "s"
and "p"
. Notice that this method is not commutative (that is, a - b == b - a
is not always true).
You may be looking for the symmetric_difference
or ^
method:
>>> set("spam") ^ set("ham")
{'h', 'p', 's'}
This operator is commutative.
Because that is the definition of a set difference. In plain English, it is equivalent to "what elements are in A that are not also in B?".
Note the reverse behavior makes this more obvious
>>> set('spam') - set('ham')
{'s', 'p'}
>>> set('ham') - set('spam')
{'h'}
To get all unique elements, disregarding the order in which you ask, you can use symmetric_difference
>>> set('spam').symmetric_difference(set('ham'))
{'s', 'h', 'p'}
There are two different operators:
A - B
or A.difference(B)
.A ^ B
or A.symmetric_difference(B)
.Your code is using the former, whereas you seem to be expecting the latter.
The set difference is the set of all characters in the first set that are not in the second set. 'p' and 's' appear in the first set but not in the second, so they are in the set difference. 'h' does not appear in the first set, so it is not in the set difference (regardless of whether or not it is in the first set).
You can also obtain the desired result as:
>>> (set('spam') | set('ham')) - (set('spam') & set('ham'))
set(['p', 's', 'h'])
Create union using |
and intersection using &
and then do the set difference, i.e. differences between all elements and common elements.