How does heapq.nlargest work?

2019-01-18 04:03发布

I was looking at this pycon talk, 34:30 and the speaker says that getting the t largest elements of a list of n elements can be done in O(t + n).

How is that possible? My understanding is that creating the heap will be O(n), but what's the complexity of nlargest itself, is it O(n + t) or O(t) (and what's the actual algorithm)?

标签： python algorithm heap time-complexity

1条回答

Viruses.

2楼-- · 2019-01-18 04:45

The speaker is wrong in this case. The actual cost is O(n * log(t)). Heapify is called only on the first t elements of the iterable. That's O(t), but is insignificant if t is much smaller than n. Then all the remaining elements are added to this "little heap" via heappushpop, one at a time. That takes O(log(t)) time per invocation of heappushpop. The length of the heap remains t throughout. At the very end, the heap is sorted, which costs O(t * log(t)), but that's also insignificant if t is much smaller than n.

Fun with Theory ;-)

There are reasonably easy ways to find the t'th-largest element in expected O(n) time; for example, see here. There are harder ways to do it in worst-case O(n) time. Then, in another pass over the input, you could output the t elements >= the t-th largest (with tedious complications in case of duplicates). So the whole job can be done in O(n) time.

But those ways require O(n) memory too. Python doesn't use them. An advantage of what's actually implemented is that the worst-case "extra" memory burden is O(t), and that can be very significant when the input is, for example, a generator producing a great many values.

0人赞添加讨论(0) 举报

How does heapq.nlargest work?

Fun with Theory ;-)

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间