How can retrieve the two highest item from a list containing 100,000 integers without having to sort the entire list first?
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to toggle on Order in ReactJS
- How to get the background from multiple images by
- PHP Recursively File Folder Scan Sorted by Modific
This will work, but I don't know if you want to retain the items in the list:
If you do, you can do this:
"2 highest" is impossible; only one item can be "highest". Perhaps you mean "highest 2". In any case, you need to say what to do when the list contains duplicates. What do you want from [8, 9, 10, 10]: (10, 9) or (10, 10)? If your response is (10, 10), please consider input of [8, 9, 10, 10, 10]. What are you going to do with the "highest two" when you've got them? Please edit your question to give this guidance.
In the meantime, here's an answer that takes the first approach (two unique values):
You should add guards against fewer than 2 unique values in the list.
The second highest item is a fairly simple case, but for the kth highest item what you want is a selection algorithm. That page is pretty thorough so it's probably best just to read that.
A really slick way is to use
heapq
. Heapify the array (O(n)), then just pop an many elements that you need (log(n)). (Saw this question in an interview once, good question to keep in mind.)JacobM's answer is absolutely the way to go. However, there are a few things to keep in mind while implementing what he described. Here's a little play-along-at-home tutorial to guide you through the trickier parts of solving this problem.
If this code is meant for production use, please use one of the more efficient/concise answers listed. This answer is targetted at someone new to programming.
The idea
The idea is simple.
largest
andsecond_largest
.largest
, assign it tolargest
.second_largest
, but less thanlargest
, assign it tosecond_largest
.Getting started
Let's start.
Okay, we now have JacobM's answer as a Python function. What happens when we try to run it?
Apparently, we need to set
largest
before we start the loop. This probably means we should setsecond_largest
too.Initializing variables
Let's set
largest
andsecond_largest
to 0.Good. Let's run it.
Great! Now let's test with
inlist
being[1, 2, 3]
Let's try it.
...Uh oh.
Fixing the logic
The largest value (3) seems correct. The second-largest value is completely wrong though. What's going on?
Let's work through what the function is doing.
largest
is 0 andsecond_largest
is also 0.largest
becomes 1.largest
becomes 2.But what about
second_largest
?When we assign a new value to
largest
, the largest value actually becomes second-largest. We need to show that in the code.Let's run it.
Fantastic.
Initializing variables, part 2
Now let's try it with a list of negative numbers.
Let's run it.
That's not right at all. Where did these zeroes come from?
It turns out that the starting values for
largest
andsecond_largest
were actually larger than all the items in the list. The first thing you might consider is settinglargest
andsecond_largest
to the lowest values possible in Python. Unfortunately, Python doesn't have a smallest possible value. That means that, even if you set both of them to -1,000,000,000,000,000,000, you can have a list of values smaller than that.So what's the best thing to do? Let's try setting
largest
andsecond_largest
to the first and second items in the list. Then, to avoid double-counting any items in the list, we only look at the part of the list after the second item.Let's run it.
Great! Let's try with another list of negative numbers.
Let's run it.
Wait, what?
Initializing variables, part 3
Let's step through our logic again.
largest
is set to -3second_largest
is set to -2Wait right there. Already, this seems wrong. -2 is larger than -3. Is this what caused the problem? Let's continue.
largest
is set to -1;second_largest
is set to the old value oflargest
, which is -3Yes, that looks to be the problem. We need to ensure that
largest
andsecond_largest
are set correctly.Let's run it.
Excellent.
Conclusion
So here's the code, nicely commented and formatted. It's also had all the bugs I could find beaten from it. Enjoy.
However, assuming this really is a homework question, I hope you get some useful experience from seeing an imperfect piece of code slowly improved. I hope some of these techniques will be useful in future programming assignments.
Efficiency
Not very efficient. But for most purposes, it should be okay: on my computer (Core 2 Duo), a list of 100 000 items can be processed in 0.27 seconds (using
timeit
, averaged over 100 runs).