In Python, I have a list:
L = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
I want to identify the item that occurred the highest number of times. I am able to solve it but I need the fastest way to do so. I know there is a nice Pythonic answer to this.
For older Python versions (< 2.7), you can use this receipe to get the
Counter
class.I obtained the best results with
groupby
fromitertools
module with this function using Python 3.5.2:Output:
Test with
timeit
fromtimeit
module.I used this script for my test with
number= 20000
:Output (The best one):
A simple way without any libraries or sets
Here is a
defaultdict
solution that will work with Python versions 2.5 and above:Note if
L = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 7, 7, 7, 7, 7, 56, 6, 7, 67]
then there are six 4s and six 7s. However, the result will be(4, 6)
i.e. six 4s.In your question, you asked for the fastest way to do it. As has been demonstrated repeatedly, particularly with Python, intuition is not a reliable guide: you need to measure.
Here's a simple test of several different implementations:
The results on my machine:
So it appears that the
Counter
solution is not the fastest. And, in this case at least,groupby
is faster.defaultdict
is good but you pay a little bit for its convenience; it's slightly faster to use a regulardict
with aget
.What happens if the list is much bigger? Adding
L *= 10000
to the test above and reducing the repeat count to 200:Now
defaultdict
is the clear winner. So perhaps the cost of the 'get' method and the loss of the inplace add adds up (an examination of the generated code is left as an exercise).But with the modified test data, the number of unique item values did not change so presumably
dict
anddefaultdict
have an advantage there over the other implementations. So what happens if we use the bigger list but substantially increase the number of unique items? Replacing the initialization of L with:So now
Counter
is clearly faster than thegroupby
solutions but still slower than theiteritems
versions ofdict
anddefaultdict
.The point of these examples isn't to produce an optimal solution. The point is that there often isn't one optimal general solution. Plus there are other performance criteria. The memory requirements will differ substantially among the solutions and, as the size of the input goes up, memory requirements may become the overriding factor in algorithm selection.
Bottom line: it all depends and you need to measure.
may something like this:
testList = [1, 2, 3, 4, 2, 2, 1, 4, 4] print(max(set(testList), key = testList.count))