For example, I need to count how many times a word appears in a list, not sorted by frequency but with the order in which the words appear, i.e. insertion order.
from collections import Counter
words = ['oranges', 'apples', 'apples', 'bananas', 'kiwis', 'kiwis', 'apples']
c = Counter(words)
print(c)
So instead of: {'apples': 3, 'kiwis': 2, 'bananas': 1, 'oranges': 1}
I'd rather get: {'oranges': 1, 'apples': 3, 'bananas': 1, 'kiwis': 2}
And I don't really need this Counter
method, any way that will produce correct result is OK for me.
You can use the recipe that uses collections.Counter
and collections.OrderedDict
:
from collections import Counter, OrderedDict
class OrderedCounter(Counter, OrderedDict):
'Counter that remembers the order elements are first encountered'
def __repr__(self):
return '%s(%r)' % (self.__class__.__name__, OrderedDict(self))
def __reduce__(self):
return self.__class__, (OrderedDict(self),)
words = ["oranges", "apples", "apples", "bananas", "kiwis", "kiwis", "apples"]
c = OrderedCounter(words)
print(c)
# OrderedCounter(OrderedDict([('oranges', 1), ('apples', 3), ('bananas', 1), ('kiwis', 2)]))
On Python 3.6+, dict
will now maintain insertion order.
So you can do:
words = ["oranges", "apples", "apples", "bananas", "kiwis", "kiwis", "apples"]
counter={}
for w in words: counter[w]=counter.get(w, 0)+1
>>> counter
{'oranges': 1, 'apples': 3, 'bananas': 1, 'kiwis': 2}
Unfortunately, the Counter in Python 3.6 and 3.7 does not display the insertion order that it maintains; instead, __repr__
sorts the return by the most to least common.
But you can use the same OrderedDict recipe but just use the Python 3.6+ dict instead:
from collections import Counter
class OrderedCounter(Counter, dict):
'Counter that remembers the order elements are first encountered'
def __repr__(self):
return '%s(%r)' % (self.__class__.__name__, dict(self))
def __reduce__(self):
return self.__class__, (dict(self),)
>>> OrderedCounter(words)
OrderedCounter({'oranges': 1, 'apples': 3, 'bananas': 1, 'kiwis': 2})
Or, since Counter is a subclass of dict
that maintains order in Python 3.6+, you can just avoid using Counter's __repr__
by either calling .items()
on the counter or turning the counter back into a dict
:
>>> c=Counter(words)
This presentation of that Counter is sorted by most common element to least and uses Counters __repr__
method:
>>> c
Counter({'apples': 3, 'kiwis': 2, 'oranges': 1, 'bananas': 1})
This presentation is as encountered, or insertion order:
>>> c.items()
dict_items([('oranges', 1), ('apples', 3), ('bananas', 1), ('kiwis', 2)])
Or,
>>> dict(c)
{'oranges': 1, 'apples': 3, 'bananas': 1, 'kiwis': 2}
In Python 3.6, dictionaries are insertion ordered, but this is an implementation detail.
In Python 3.7+, insertion order is guaranteed and can be relied upon. See Are dictionaries ordered in Python 3.6+? for more details.
So, depending on your Python version, you may wish to just use Counter
as is, without creating an OrderedCounter
class as described in the documentation. This works because Counter
is a subclass of dict
, i.e. issubclass(Counter, dict)
returns True
, and therefore inherits the insertion ordering behaviour of dict
.
String representation
It is worth noting the the string representation for Counter
, as defined in the repr
method, has not been updated to reflect the change in 3.6 / 3.7, i.e. print(Counter(some_iterable))
still returns items from largest counts descending. You can trivially return the insertion order via list(Counter(some_iterable))
.
Here are some examples demonstrating the behaviour:
x = 'xyyxy'
print(Counter(x)) # Counter({'y': 3, 'x': 2}), i.e. most common first
print(list(Counter(x))) # ['x', 'y'], i.e. insertion ordered
print(OrderedCounter(x)) # OC(OD([('x', 2), ('y', 3)])), i.e. insertion ordered
Exceptions
You should not use a regular Counter
if additional or overwritten methods available to OrderedCounter
are important to you. Of particular note:
OrderedDict
and consequently OrderedCounter
offer popitem
and move_to_end
methods.
- Equality tests between
OrderedCounter
objects are order-sensitive and are implemented as list(oc1.items()) == list(oc2.items())
.
For example, equality tests will yield different results:
Counter('xy') == Counter('yx') # True
OrderedCounter('xy') == OrderedCounter('yx') # False