Is it theoretically possible to sort an array of n integers in an amortized complexity of O(n)?
What about trying to create a worst case of O(n) complexity?
Most of the algorithms today are built on O(nlogn) average + O(n^2) worst case.
Some, while using more memory are O(nlogn) worst.
Can you with no limitation on memory usage create such an algorithm?
What if your memory is limited? how will this hurt your algorithm?
Any page on the intertubes that deals with comparison-based sorts will tell you that you cannot sort faster than O(n lg n)
with comparison sorts. That is, if your sorting algorithm decides the order by comparing 2 elements against each other, you cannot do better than that. Examples include quicksort, bubblesort, mergesort.
Some algorithms, like count sort or bucket sort or radix sort do not use comparisons. Instead, they rely on the properties of the data itself, like the range of values in the data or the size of the data value.
Those algorithms might have faster complexities. Here is an example scenario:
You are sorting 10^6
integers, and each integer is between 0
and 10
. Then you can just count the number of zeros, ones, twos, etc. and spit them back out in sorted order. That is how countsort works, in O(n + m)
where m
is the number of values your datum can take (in this case, m=11
).
Another:
You are sorting 10^6
binary strings that are all at most 5
characters in length. You can use the radix sort for that: first split them into 2 buckets depending on their first character, then radix-sort them for the second character, third, fourth and fifth. As long as each step is a stable sort, you should end up with a perfectly sorted list in O(nm)
, where m is the number of digits or bits in your datum (in this case, m=5
).
But in the general case, you cannot sort faster than O(n lg n)
reliably (using a comparison sort).
I'm not quite happy with the accepted answer so far. So I'm retrying an answer:
Is it theoretically possible to sort an array of n integers in an amortized complexity of O(n)?
The answer to this question depends on the machine that would execute the sorting algorithm. If you have a random access machine, which can operate on exactly 1 bit, you can do radix sort for integers with at most k
bits, which was already suggested. So you end up with complexity O(kn)
.
But if you are operating on a fixed size word machine with a word size of at least k
bits (which all consumer computers are), the best you can achieve is O(n log n)
. This is because either log n < k
or you could do a count sort first and then sort with a O (n log n)
algorithm, which would yield the first case also.
What about trying to create a worst case of O(n) complexity?
That is not possible. A link was already given. The idea of the proof is that in order to be able to sort, you have to decide for every element to be sorted if it is larger or smaller to any other element to be sorted. By using transitivity this can be represented as a decision tree, which has n
nodes and log n
depth at best. So if you want to have performance better than Ω(n log n)
this means removing edges from that decision tree. But if the decision tree is not complete, than how can you make sure that you have made a correct decision about some elements a
and b
?
Can you with no limitation on memory usage create such an algorithm?
So as from above that is not possible. And the remaining questions are therefore of no relevance.
If the integers are in a limited range then an O(n) "sort" of them would involve having a bit vector of "n" bits ... looping over the integers in question and setting the n%8 bit of offset n//8 in that byte array to true. That is an "O(n)" operation. Another loop over that bit array to list/enumerate/return/print all the set bits is, likewise, an O(n) operation. (Naturally O(2n) is reduced to O(n)).
This is a special case where n is small enough to fit within memory or in a file (with seek()) operations). It is not a general solution; but it is described in Bentley's "Programming Pearls" --- and was allegedly a practical solution to a real-world problem (involving something like a "freelist" of telephone numbers ... something like: find the first available phone number that could be issued to a new subscriber).
(Note: log(10*10) is ~24 bits to represent every possible integer up to 10 digits in length ... so there's plenty of room in 2*31 bits of a typical Unix/Linux maximum sized memory mapping).
I believe you are looking for radix sort.