I am trying to implement radix sort for integers, including negative integers. For non-negative ints, I was planning to create a queue of 10 queues correspondingly for the digits 0-9 and implement the LSD algorithm. But I was kind of confused with negative integers. What I am thinking now, is to go ahead and create another queue of 10 queues for them and separately sort them and then at the end, I will gave 2 lists, one containing negative ints sorted and the other containing non-negative ints. And finally I would merge them.
What do you think about this? Is there more efficient way to handle with negative integers?
Thank you!
Your radix sort wont be faster than the famous comparison sorts if you dont use "bitshift" and "bitwise AND" for radix calculation.
Computers use 2's complement to represent signed numbers, here the sign-bit lies at the leftmost end of a binary digit, in memory representation
eg
436163157 (as 32 bit number) = 00011001 11111111 01010010 01010101
-436163157 (as 32 bit number) = 11100110 00000000 10101101 10101011
1 (as 32 bit number) = 00000000 00000000 00000000 00000001
-1 (as 32 bit number) = 11111111 1111111 1111111 11111111
0 is represented as = 00000000 00000000 00000000 00000000
Highest negative value as = 10000000 00000000 00000000 00000000
So you see, the more negative a number becomes, it looses that many 1's, a small negative number has many 1's, if you set only the sign-bit to 0, it becomes a very large positive number. Vice versa a small positive number becomes a large negative number.
In radix sort the key to sorting negative numbers is how you handle the last 8 bits, for negative numbers at least the last bit has to be 1, in 32-bit scheme it has to be from
10000000 00000000 00000000 00000000 which is the most negative value farthest from zero to 11111111 11111111 11111111 11111111 which is -1. If you look at the leftmost 8 bits, the magnitude ranges from 10000000 to 11111111, i.e. from 128 to 255.
These values can be obtained by this code piece
For negative numbers V will always lie from 128 upto 255. For positive numbers it will be from 0 to 127. As said earlier, the value of M will be 255 for -1 and 128 for highest negative number in 32-bit scheme. Build up your histogram as usual. Then from index 128 to 255 do the cumulative sum, then add frequency of 255 to 0, and proceed the cumulative sum from 0 till index 127. Perform the Sort as usual. This technique is both optimal, fast, elegant and neat both in theory and in practice. No need of any kind of separate lists nor order reversal after sorting nor converting all inputs to positive which make the sort slow and messy.
For the code see Radix Sort Optimization
A 64-bit version can be built using same concepts
Further read:
http://codercorner.com/RadixSortRevisited.htm
http://stereopsis.com/radix.html
This can be done without requiring partitioning or having to practically invert the MSB. Here's a working solution in Java:
One more solution is to separate negative integers from the array, make them positive, sort as positive values using radix, then reverse it and append with sorted non-negative array.
Absolutely! Of course you do have to take care of splitting up the negatives from the positives but luckily this is easy. At the beginning of your sorting algorithm all you have to do is partition your array around the value 0. After that, radix sort below and above the partition.
Here is the algorithm in practice. I derived this from Kevin Wayne and Bob Sedgewick's MSD radix sort: http://algs4.cs.princeton.edu/51radix/MSD.java.html
Note that the sign bit is the uppermost bit in a signed integer, but all numbers are treated by radix sort as unsigned integers by default. So you need to tell the algorithm that negative numbers are smaller than positive ones. In case of 32-bit signed integers, you can sort three lower bytes first, then sort the fourth (upper) byte with the sign bit inverted so that 0 will be used for negative numbers instead of 1, and consequently they will go first.
I strongly advise to sort numbers byte-by-byte rather than by decimal digits, because it's far easier for the machine to pick up bytes than extract digits.
You can treat the sign as a special kind of digit. You sort the pile on the units, then the tens, etc. and finally on the sign. This does produce a reversed order for the negatives, you then simply reverse the contents of that bucket. It's how old mechanical card sorters worked.