可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am reading some text which claims this regarding the ordering of the two recursive Quicksort calls:

... it is important to call the smaller subproblem first, this in conjunction with tail recursion ensures that the stack depth is log n.

I am not at all sure what that means, why should I call Quicksort on the smaller subarray first?

回答1:

Look at quicksort as an implicit binary tree. The pivot is the root, and the left and right subtrees are the partitions you create.

Now consider doing a depth first search of this tree. The recursive calls actually correspond to doing a depth first search on the implicit tree described above. Also assume that the tree always has the smaller sub-tree as the left child, so the suggestion is in fact to do a pre-order on this tree.

Now suppose you implement the preorder using a stack, where you push only the left child (but keep the parent on the stack) and when the time comes to push the right child (say you maintained a state where you knew whether a node has its left child explored or not), you replace the top of stack, instead of pushing the right child (this corresponds to the tail recursion part).

The maximum stack depth is the maximum 'left depth': i.e. if you mark each edge going to a left child as 1, and going to a right child as 0, then you are looking at the path with maximum sum of edges (basically you don't count the right edges).

Now since the left sub-tree has no more than half the elements, each time you go left (i.e. traverse and edge marked 1), you are reducing the number of nodes left to explore by at least half.

Thus the maximum number of edges marked 1 that you see, is no more than log n.

Thus the stack usage is no more than log n, if you always pick the smaller partition, and use tail recursion.

回答2:

Some languages have tail recursion. This means that if you write f(x) { ... ... .. ... .. g(x)} then the final call, to g(x), isn't implemented with a function call at all, but with a jump, so that the final call does not use any stack space.

Quicksort splits the data to be sorted into two sections. If you always handle the shorter section first, then each call that consumes stack space has a section of data to sort that is at most half the size of the recursive call that called it. So if you start off with 10 elements to sort, the stack at its deepest will have a call sorting those 10 elements, and then a call sorting at most 5 elements, and then a call sorting at most 2 elements, and then a call sorting at most 1 element - and then, for 10 elements, the stack cannot go any deeper - the stack size is limited by the log of the data size.

If you didn't worry about this, you could end up with the stack holding a call sorting 10 elements, and then a call sorting 9 elements, and then a call sorting 8 elements, and so on, so that the stack was as deep as the number of elements to be sorted. But this can't happen with tail recursion if you sort the short sections first, because although you can split 10 elements into 1 element and 9 elements, the call sorting 9 elements is done last of all and implemented as a jump, which doesn't use any more stack space - it reuses the stack space previously used by its caller, which was just about to return anyway.

回答3:

Ideally, the list is partitions into two roughly similar size sublists. It doesn't matter much which sublist you work on first.

But if on a bad day the list partitions in the most lopsided way possible, a sublist of two or three items, maybe four, and a sublist nearly as long as the original. This could be due to bad choices of partition value or wickedly contrived data. Imagine what would happen if you worked on the bigger sublist first. The first invocation of Quicksort is holding the pointers/indices for the short list in its stack frame while recursively calling quicksort for the long list. This too partitions badly into a very short list and a long one, and we do the longer sublist first, repeat...

Ultimately, on the baddest of bad days with the wickedest of wicked data, we'll have stack frames built up in number proportional to the original list length. This is quicksort's worst case behavior, O(n) depth of recursive calls. (Note we are talking of quicksort's depth of recursion, not performance.)

Doing the shorter sublist first gets rid of it fairly quickly. We still process a larger number of tiny lists, in proportion to the original list length, but now each one is taken care of by a shallow one or two recursive calls. We still make O(n) calls (performance) but each is depth O(1).

回答4:

Surprisingly, this turns out to be important even when quicksort is not confronted with wildly unbalanced partitions, and even when introsort is actually being used.

The problem arises (in C++) when the values in the container being sorted are really big. By this, I don't mean that they point to really big objects, but that they are themselves really big. In that case, some (possibly many) compilers will make the recursive stack frame quite big, too, because it needs at least one temporary value in order to do a swap. Swap is called inside of partition, which is not itself recursive, so you would think that the quicksort recursive driver would not require the monster stack-frame; unfortunately, partition usually ends up being inlined because it's nice and short, and not called from anywhere else.

Normally the difference between 20 and 40 stack frames is negligible, but if the values weigh in at, say, 8kb, then the difference between 20 and 40 stack frames could mean the difference between working and stack overflow, if stacks have been reduced in size to allow for many threads.

If you use the "always recurse into the smaller partition" algorithm, the stack cannot every exceed log₂ N frames, where N is the number of elements in the vector. Furthermore, N cannot exceed the amount of memory available divided by the size of an element. So on a 32-bit machine, the there could only be 2¹⁹ 8kb elements in a vector, and the quicksort call depth could not exceed 19.

In short, writing quicksort correctly makes its stack usage predictable (as long as you can predict the size of a stack frame). Not bothering with the optimization (to save a single comparison!) can easily cause the stack depth to double even in non-pathological cases, and in pathological cases it can get a lot worse.