I need a bag/multiset-like data type in Python. I understand collections.Counter is often used for this purpose. But the comparison operators don't seem to work:
In [1]: from collections import Counter
In [2]: bag1 = Counter(a=1, b=2, c=3)
In [3]: bag2 = Counter(a=2, b=2)
In [4]: bag1 > bag2
Out[4]: True
This seems like a bug to me. I expected the less-than and greater-than operators to perform set-like subset and superset comparisons. But if that were the case then bag1 > bag2
would be false because bag2
contains an extra 'a'
. There also don't seem to be subset/superset methods on Counter objects. So I have two questions:
- What comparison logic is used for Counter objects?
- How can I compare Counter objects for subset, superset, proper-subset, and proper-superset?
On Python 2, the comparison falls back to the default sort order for dictionaries (Counter
is a subclass of dict
).
Mappings (dictionaries) compare equal if and only if their sorted
(key, value) lists compare equal. [5] Outcomes other than equality are
resolved consistently, but are not otherwise defined. [6]
On Python 3, the comparison raises a TypeError
:
Mappings (dictionaries) compare equal if and only if they have the
same (key, value) pairs. Order comparisons ('<', '<=', '>=', '>')
raise TypeError
.
This unanswered question is of interest:
How can I compare Counter objects for subset, superset, proper-subset, and proper-superset?
By defining the missing “rich comparison methods". You could also use free functions instead, which will make client code more explicit.
from collections import Counter
class PartiallyOrderedCounter(Counter):
def __le__(self, other):
""" Multiset inclusion """
return all( v <= other[k] for k,v in self.items() )
def __lt__(self, other):
""" Multiset strict inclusion """
return self <= other and self != other
# TODO : __ge__ and __gt__
# Beware : they CANNOT be written in terms of __le__ or __lt__
a = PartiallyOrderedCounter('abc')
b = PartiallyOrderedCounter('ab')
c = PartiallyOrderedCounter('abe')
assert a <= a
assert not a < a
assert b <= a
assert b < a
assert not a < b
assert not c <= a
assert not a <= c