I want to sort a large array of integers (say 1 millon elements) lexicographically.
Example:
input [] = { 100, 21 , 22 , 99 , 1 , 927 }
sorted[] = { 1 , 100, 21 , 22 , 927, 99 }
I have done it using the simplest possible method:
- convert all numbers to strings (very costly because it will take huge memory)
- use
std:sort
withstrcmp
as comparison function - convert back the strings to integers
Is there a better method than this?
The task sounds like a natural fit for an MSD variant of Radix Sort with padding ( http://en.wikipedia.org/wiki/Radix_sort ).
Depends on how much code you want to throw at it. The simple code as the others show is O(log n) complexity, while a fully optimized radix sort would be O(kn).
A compact solution if all your numbers are nonnegative and they are small enough so that multiplying them by 10 doesn't cause an overflow:
Run it like this:
You could try using the % operator to give you access to each individual digit eg 121 % 100 will give you the first digit and check that way but you'll have to find a way to get around the fact they have different sizes.
So find the maximum value in array. I don't know if theres a function for this in built you could try.
This function will return the number of digits in the number
Let number of digits in max equal n. Once you have this open a for loop in the format of for (int i = 1; i < n ; i++)
then you can go through your and use "data[i] % (10^(n-i))" to get access to the first digit then sort that and then on the next iteration you'll get access to the second digit. I Don't know how you'll sort them though.
It wont work for negative numbers and you'll have to get around data[i] % (10^(n-i)) returning itself for numbers with less digits than max
Here is the dumb solution that doesn't use any floating point tricks. It's pretty much the same as the string comparison, but doesn't use a string per say, doesn't also handle negative numbers, to do that add a section at the top...
It's fast, and I'm sure it's possible to make it faster still, but it works and it's dumb enough to understand...
EDIT: I ate to dump lots of code, but here is a comparison of all the solutions so far..
Overload the < operator to compare two integers lexicographically. For each integer, find the smallest 10^k, which is not less than the given integer. Than compare the digits one by one.
Use
std::sort()
with a suitable comparison function. This cuts down on the memory requirements.The comparison function can use
n % 10
,n / 10 % 10
,n / 100 % 10
etc. to access the individual digits (for positive integers; negative integers work a bit differently).