What is a Deterministic Quicksort?

2020-08-17 05:29发布

问题:

I have been reading about Quicksort and found that sometimes it' s referred to as "Deterministic Quicksort".

Is this an alternate version of the normal Quicksort ? What is the difference between a normal Quicksort and a Deterministic Quicksort ?

回答1:

The ordinary ("deterministic") Quicksort can have very poor behaviour on particular datasets (as an example, an implementation that picks the first unsorted element has O(n^2) time complexity on already-sorted data).

Randomized Quicksort (which selects a random pivot, rather than choosing deterministically) is sometimes used to give better expected performance over all datasets.



回答2:

Quicksort runs in O(n log n) expected/average time, but O(n^2) worst case. This occurs if the pivot chosen is consistently either the minimum or the maximum.

Ideally, you want to select the median as your pivot. If finding the median directly is too costly (usually this is the case if you're trying to use quicksort), what's commonly done instead is to either take the median of three potential pivot elements, or else just pick a random element as your pivot.

The latter method renders quicksort nondeterministic because of the randomness inherent to the pivot selection process.



回答3:

In general, a sort algorithm is "deterministic" if it consistently sorts the elements in the exact same order every time. Given a set of records to sort on id (asc):

  1 Censu
  11 Marju
  4  Cikku
  11 Lonzu

then a sorting algorithm could return both Censu, Cikk, Marju, Lonzu, or Censu, Cikku, Lonzu, Marju, as correct sortings. A deterministic sort is one which always returns the same ordering. This needn't always be the case. In the case of quicksort, one can get faster average performance if pivots are chosen randomly (ideally you would choose the median, but this can be costly). However, this comes at a cost: your search is no longer deterministic.



回答4:

Your source can (and should) give its own definition, but generally a deterministic quicksort is one where the pivot is chosen through a formula that doesn't depend on random numbers. For example, always pick the middle element or always the first, or something like this. This means that its performance will always be the same (in theory anyway, although in practice the difference shouldn't be too big) no matter how many times you run it on the same input. A randomised quicksort means that you are using random numbers when choosing the pivot, meaning the performance cannot be (easily) predicted for different runs on the same input.



回答5:

It has to do with the partitioning (or the divide step from the famed Divide and Conquer which is used in Quick sort). If every time the last (or first element or element at any position, just that it has to be the same position every time the data set is divided) is used as the pivot to partition then it is Deterministic Quick sort. If the pivot is picked at random then it is Randomized quick sort.

Here is a lecture note which puts it across.

I hope it helps

cheers



回答6:

Common adjectives in front of quicksort are deterministic and randomized. Deterministic means that the quicksort will always sort the same set of data in the same fashion while a randomized quicksort uses randomization and will rarely sort the same data in the same exact fashion (unless the data set is very small - then it is more common).

Deterministic

It comes down to how the pivots are chosen. In a deterministic quicksort, the pivots are chosen by either always choosing the pivot at the same relative index such as the first, last, or middle element or by using the median of any number of predetermined element choices. For instance, a common method is to choose the median of first, last, and middle elements as the pivot. Even with the median-of-3 method I just described, certain datasets can easily give O(N^2) time complexity. An example dataset is the so-called organ pipes set of data:

array = [1,2,3,4,5,6,7,8,9,10,9,8,7,6,5,4,3,2,1]

Randomized

Randomizated quicksorts can choose just a random pivot or use the median of some number of randomly chosen pivots. There is still the possibility of O(N^2) time complexity, but the probability is much, much smaller and becomes smaller with increasing dataset size.



回答7:

Besides what many others have already told you about how a deterministic quick sort is implemented and a non-deterministic one is, I believe one, much more important aspect of such sort, is that, with deterministic quicksort, you always have the same order of records when keys clash, while with non-deterministic quicksorts, the order of such records can be different each time you run the sort.

I guess you should not use non-deterministic quicksorting when you have non-unique keys.