Answers So Far
So here is the code breakdown.
//Time: ~7s (linear loop algorithm)
//100,000! (456,574 decimal digits)
BigInteger bigIntVar = computeFactorial(100000);
//The first three here are just for comparison and are not actually Base 10.
bigIntVar.ToBase64String() //Time: 00.001s | Base 64 | Tetrasexagesimal
bigIntVar.ToString("x") //Time: 00.016s | Base 16 | Hexadecimal
bigIntVar.ToBinaryString() //Time: 00.026s | Base 02 | Binary
bigIntVar.ToQuickString() //Time: 11.200s | Base 10 | String Version
bigIntVar.ToQuickString() //Time: 12.500s | Base 10 | StringBuilder Version
bigIntVar.ToString() //Time: 13.300s | Base 10 | Original
Original Question Stuff
I have spent way to much time on this, so I need your help.
This is for a personal project to compute ginormous factorials (ex. 100,000!)
Here is my code:
using (var stream = new StreamWriter(fileName + ".txt", false))
{
stream.WriteLine(header);
var timer = new Stopwatch();
timer.Restart();
//This is the huge BigInteger holding the answer to 100,000!
stream.WriteLine(saveFactorial.Output.ToString());
//Let me be clear: ToString() is directly causing the the 13sec time delay.
//Not the stream.
timer.Stop();
}
time = (timer.ElapsedMilliseconds / 1000.0).ToString() + "s";
MessageBox.Show(time);
On 100,000! is takes about 7sec on my machine to compute (linear loop algorithm).
Yet with this standard IO code it takes 13sec to save.
So in other words, it takes longer to save the work than it does to modestly compute it.
So I thought maybe I could use:
BigInteger.ToByteArray();
Although this runs extremely fast, I couldn't figure out how to save it to readable text.
You can use the above method to write the binary string to a text file with this self-made extension:
ToBinaryString
//Usage: string bigIntBinary = bigIntVar.ToBinaryString();
public static string ToBinaryString(this BigInteger source)
{
//If you lookup the ToByteArray() method...
//It actually stores the bytes in reverse order.
var bigIntBytes = source.ToByteArray().Reverse();
StringBuilder bigIntBinary = new StringBuilder();
foreach (var bigIntByte in bigIntBytes)
{
bigIntBinary.Append(Convert.ToString(bigIntByte, 2).PadLeft(8, '0'));
}
return bigIntBinary.ToString();
}
ToBase64String
////Usage: string bigIntBase64 = bigIntVar.ToBase64String();
public static string ToBase64String(this BigInteger source)
{
var bigIntBytes = source.ToByteArray().Reverse().ToArray();
return Convert.ToBase64String(bigIntBytes);
}
I also tried the math way (mod 10, etc...) to get each digit, but that takes a TON more time that ToString().
What am I doing wrong here?
This code is what I came up with based on the answer below.
This is faster than ToString(), but only by a couple seconds.
ToQuickString
//Usage: string bigIntString = bigIntVar.ToQuickString()
public static String ToQuickString(this BigInteger source)
{
powersOfTen = new List<BigInteger>();
powersOfTen.Add(1);
for (BigInteger i = 10; i < source; i *= i)
{
powersOfTen.Add(i);
}
return BuildString(source, powersOfTen.Count - 1).ToString().TrimStart('0');
}
private static List<BigInteger> powersOfTen;
private static string BuildString(BigInteger n, int m)
{
if (m == 0)
return n.ToString();
BigInteger remainder;
BigInteger quotient = BigInteger.DivRem(n, powersOfTen[m], out remainder);
return BuildString(quotient, m - 1) + BuildString(remainder, m - 1);
}
Save the BigInteger data in binary or hex format. It is readable to the computer, and to sufficiently dedicated humans. ;>
Spending extra effort to make the output "human readable" is a waste of time. No human is going to be able to make sense out of 450,000 digits regardless of whether they are base 10, base 16, base 2, or anything else.
UPDATE
Looking into the Base 10 conversion a little more closely, it is possible to cut the baseline performance of ToString almost in half using multiple threads on a multi core system. The main obstacle is that the largest consumer of time across the entire decimalization process is the first division operation on the original 450k digit number.
Stats on my quad core P7:
Generating a 500k digit random number using power and multiply: 5 seconds
Dividing that big number by anything just once: 11 seconds
ToString(): 22 seconds
ToQuickString: 18 seconds
ToStringMT: 12.9 seconds
.
public static class BigIntExtensions
{
private static List<BigInteger> powersOfTen;
// Must be called before ToStringMt()
public static void InitPowersOfTen(BigInteger n)
{
powersOfTen = new List<BigInteger>();
powersOfTen.Add(1);
for (BigInteger i = 10; i < n; i *= i)
powersOfTen.Add(i);
}
public static string ToStringMT(this BigInteger n)
{
// compute the index into the powersOfTen table for the given parameter. This is very fast.
var m = (int)Math.Ceiling(Math.Log(BigInteger.Log10(n), 2));
BigInteger r1;
// the largest amount of execution time happens right here:
BigInteger q1 = BigInteger.DivRem(n, BigIntExtensions.powersOfTen[m], out r1);
// split the remaining work across 4 threads - 3 new threads plus the current thread
var t1 = Task.Factory.StartNew<string>(() =>
{
BigInteger r1r2;
BigInteger r1q2 = BigInteger.DivRem(r1, BigIntExtensions.powersOfTen[m - 1], out r1r2);
var t2 = Task.Factory.StartNew<string>(() => BuildString(r1r2, m - 2));
return BuildString(r1q2, m - 2) + t2.Result;
});
BigInteger q1r2;
BigInteger q1q2 = BigInteger.DivRem(q1, BigIntExtensions.powersOfTen[m - 1], out q1r2);
var t3 = Task.Factory.StartNew<string>(() => BuildString(q1r2, m - 2));
var sb = new StringBuilder();
sb.Append(BuildString(q1q2, m - 2));
sb.Append(t3.Result);
sb.Append(t1.Result);
return sb.ToString();
}
// same as ToQuickString, but bails out before m == 0 to reduce call overhead.
// BigInteger.ToString() is faster than DivRem for smallish numbers.
private static string BuildString(BigInteger n, int m)
{
if (m <= 8)
return n.ToString();
BigInteger remainder;
BigInteger quotient = BigInteger.DivRem(n, powersOfTen[m], out remainder);
return BuildString(quotient, m - 1) + BuildString(remainder, m - 1);
}
}
For ToQuickString() and ToStringMT(), the powers of 10 array needs to be initialized prior to using these functions. Initializing this array shouldn't be included in function execution time measurements because the array can be reused across subsequent calls, so its initialization cost is amortized over the lifetime of the program, not individual function calls.
For a production system I would set up a more automatic initialization, such as initializing a reasonable number of entries in the class static constructor and then checking in ToQuickString() or ToStringMT() to see if there are enough entries in the table to handle the given BigInteger. If not, go add enough entries to the table to handle the current BigInteger, then continue with the operation.
This ToStringMT function constructs the worker tasks manually to spread the remaining work out across 4 threads on the available execution cores in a multi core CPU. You could instead just make the original ToQuickString() function spin off half of its work into another thread on each recursion, but this quickly creates too many tasks and gets bogged down in task scheduling overhead. The recursion drills all the way down to individual decimal digits. I modified the BuildString() function to bail out earlier (m <= 8 instead of m == 0) because BigInteger.ToString() is faster than DivRem for smallish numbers.
90% of ToStringMt()'s execution time is taken up by the first DivRem call. It converges very quickly after that, but the first one is really painful.
First I'd calculate all numbers of the form 10^(2^m)
smaller than n
. Then I'd use DivRem
with the largest of these to split the problem into two subproblems. Repeat that recursively until you're down to individual digits.
var powersOfTen=new List<BigInteger>();
powersOfTen.Add(1);
for(BigInteger i=10;i<n;i=i*i)
powersOfTen.Add(i);
string ToString(BigInteger n, int m)
{
if(m==0)
return n.ToString();
quotient = DivRem(n,powersOfTen[m], remainder)
return ToString(quotient, m-1)+ToString(remainder, m-1)
}
You can also optimize out the string concatenation entirely by directly writing into a character array.
Alternatively you could consider using base 1000'000'000 during all the calculations. That way you don't need the base conversion in the end at all. That's probably much faster for factorial calculation.
List<int> multiply(List<int> f1, int f2)
{
int carry=0;
for(int i=0;i<f1.Count;i++)
{
var product=(Int64)f1[i]*(Int64)f2;
carry=product/1000000000;
result.Add(product%1000000000);
}
if(carry!=0)
result.Add(carry);
}
Now conversion to a base 10 string is trivial and cheap.