In data formats where all underlying types are strings, numeric types must be converted to a standardized string format which can be compared alphabetically. For example, a short
for the value 27
could be represented as 00027
if there are no negatives.
What's the best way to represent a double
as a string? In my case I can ignore negatives, but I'd be curious how you'd represent the double in either case.
UPDATE
Based on Jon Skeet's suggestion, I'm now using this, though I'm not 100% sure it'll work correctly:
static readonly string UlongFormatString = new string('0', ulong.MaxValue.ToString().Length);
public static string ToSortableString(this double n)
{
return BitConverter.ToUInt64(BitConverter.GetBytes(BitConverter.DoubleToInt64Bits(n)), 0).ToString(UlongFormatString);
}
public static double DoubleFromSortableString(this string n)
{
return BitConverter.Int64BitsToDouble(BitConverter.ToInt64(BitConverter.GetBytes(ulong.Parse(n)), 0));
}
UPDATE 2
I have confirmed what Jon suspected - negatives don't work using this method. Here is some sample code:
void Main()
{
var a = double.MaxValue;
var b = double.MaxValue/2;
var c = 0d;
var d = double.MinValue/2;
var e = double.MinValue;
Console.WriteLine(a.ToSortableString());
Console.WriteLine(b.ToSortableString());
Console.WriteLine(c.ToSortableString());
Console.WriteLine(d.ToSortableString());
Console.WriteLine(e.ToSortableString());
}
static class Test
{
static readonly string UlongFormatString = new string('0', ulong.MaxValue.ToString().Length);
public static string ToSortableString(this double n)
{
return BitConverter.ToUInt64(BitConverter.GetBytes(BitConverter.DoubleToInt64Bits(n)), 0).ToString(UlongFormatString);
}
}
Which produces the following output:
09218868437227405311
09214364837600034815
00000000000000000000
18437736874454810623
18442240474082181119
Clearly not sorted as expected.
UPDATE 3
The accepted answer below is the correct one. Thanks guys!
I believe that a modified scientific notation, with the exponent first, and using underscore for positive, would sort lexically in the same order as numerically.
If you want, you can even append the normal representation, since a suffix won't affect sorting.
Examples
Unfortunately, it doesn't work for either negative numbers or negative exponents. You could introduce a bias for the exponent, like the IEEE format uses internally.
Padding is potentially rather awkward for doubles, given the enormous range (
double.MaxValue
is 1.7976931348623157E+308).Does the string representation still have to be human-readable, or just reversible?
That gives a reversible conversion leading to a reasonably short string representation preserving lexicographic ordering - but it wouldn't be at all obvious what the
double
value was just from the string.EDIT: Don't use
BitConverter.DoubleToInt64Bits
alone. That reverses the ordering for negative values.I'm sure you can perform this conversion using
DoubleToInt64Bits
and then some bit-twiddling, but unfortunately I can't get it to work right now, and I have three kids who are desperate to go to the park...In order to make everything sort correctly, negative numbers need to be stored in ones-complement format instead of sign magnitude (otherwise negatives and positives sort in opposite orders), and the sign bit needs to be flipped (to make negative sort less-than positives). This code should do the trick:
Demonstration here: http://ideone.com/JPNPY
Here's the complete solution, to and from strings:
Demonstration: http://ideone.com/pFciY
As it turns out... The org.apache.solr.util package contains the NumberUtils class. This class has static methods that do everything needed to convert doubles (and other data values) to sortable strings (and back). The methods could not be easier to use. A few notes:
The code below shows what needs to done to use this library.