As a part of my BigDecimal library, I need to calculate the factorial of any given non negative integer. So I'm using .Net 4.0 's System.Numerics.BigInteger
to be able to store huge numbers. Here is the function I'm using:
private BigInteger Factorial(BigInteger x)
{
BigInteger res = x;
x--;
while (x > 1)
{
res *= x;
x--;
}
return res;
}
It's working but not optimized. Now I want to use parallel computing, so here is what I've tried: (I have no experience with parallel programming)
public BigInteger Factorial(long x)
{
BigInteger res = 1;
ParallelLoopResult r = Parallel.For(2L, (x + 1), i =>
res *= i
);
return res;
}
The weird problem is that the above function works perfectly for small numbers like 5! but doesn't work for big numbers like 1000! and each time returns a completely different result. So I realized it's not thread safe and the problem is with the variable res
. I wonder what the correct implementation is?
And It would be nicer if I could use BigInteger instead of long for variable x
.
You need to make sure your parallel processes do not share any state.
For example, in the case of the factorial, I would do the following:
- set a degree of parallelism (DOP) - usually the number of processors on your machine for compute-bound operations
- divide the problem into independent subsets
- process each subset in parallel
- aggregate the results obtained from the parallel processes
This is somehow a simplified Map-Reduce.
The problem consists in multiplying together a set numbers. One way to divide this set into subsets is to use N
parallel for loops where each start at a value i
(where 0 < i <= N
) with a step of N
(and N
= DOP
).
Here's the code that does that:
/// <summary>
/// The max number of parallel tasks
/// </summary>
static readonly int DegreeOfParallelism = Environment.ProcessorCount;
public BigInteger Factorial(long x)
{
// Make as many parallel tasks as our DOP
// And make them operate on separate subsets of data
var parallelTasks =
Enumerable.Range(1, DegreeOfParallelism)
.Select(i => Task.Factory.StartNew(() => Multiply(x, i),
TaskCreationOptions.LongRunning))
.ToArray();
// after all tasks are done...
Task.WaitAll(parallelTasks);
// ... take the partial results and multiply them together
BigInteger finalResult = 1;
foreach (var partialResult in parallelTasks.Select(t => t.Result))
{
finalResult *= partialResult;
}
return finalResult;
}
/// <summary>
/// Multiplies all the integers up to upperBound, with a step equal to DOP
/// starting from a different int
/// </summary>
/// <param name="upperBoud"></param>
/// <param name="startFrom"></param>
/// <returns></returns>
public BigInteger Multiply(long upperBound, int startFrom)
{
BigInteger result = 1;
for (var i = startFrom; i <= upperBound; i += DegreeOfParallelism)
result *= i;
return result;
}
On my machine this computes 100000!
in about 30 seconds and the result is what Wolfram Alpha says it should be.
Update
After running a few more tests, I discovered something which I didn't expect: printing out the 100000!
result to the console takes ~18 seconds (the result has 456574
digits).
The results of the 100000!
computation alone (without printing the number) the are:
- ~10 seconds for parallel execution
- ~16 seconds for sequential execution
Foreword
Based on some initial and very simple benchmarking, parallel version works faster for really big factorials (larger than ~1000!). For smaller ones, the overhead of parallel processing trumps everything else and sequential version is simply faster.
Code
That being said, here's what I got to work in LINQPad:
public static class Math
{
// Sequential execution
public static System.Numerics.BigInteger Factorial(System.Numerics.BigInteger x)
{
System.Numerics.BigInteger res = x;
x--;
while (x > 1)
{
res *= x;
x--;
}
return res;
}
public static System.Numerics.BigInteger FactorialPar(System.Numerics.BigInteger x)
{
return NextBigInt().TakeWhile(i => i <= x).AsParallel().Aggregate((acc, item) => acc * item);
}
public static IEnumerable<System.Numerics.BigInteger> NextBigInt()
{
System.Numerics.BigInteger x = 0;
while(true)
{
yield return (++x);
}
}
}
That works for both small (5! = 120, 6! = 720) and large (~8000!) factorials. As I mentioned though, there's a speed increase (2 - 3 times faster) for large factorials, but a serious performance penalty (up to two orders of magnitude) for small ones (results after warmup in LINQPad):
6! x 20 -> Serial avg ticks/std dev: 4.2/2.014, Paralell avg ticks/std dev: 102.6/39.599 (parallel execution is 25 times slower...)
300! x 20 -> Serial avg ticks/std dev: 104.35, Parallel avg ticks/std dev: 405.55/175.44 (parallel runs at 1/4th of sequential, speed wise)
1000! x 20-> Serial avg ticks/std dev: 2672.05/615.744, Parallel avg ticks/std dev: 3778.65/3197.308 (parallel runs at ~70 - 90% of sequential speed)
10000! x 20 -> Serial avg ticks/std dev: 286774.95/13666.607, Parallel avg ticks/std dev: 144932.25/16671.931 (parallel is 2x faster)
Take those with a grain of salt, you'd need to compile a release version and run it as a standalone to get 'real' results, but there's a trend there that is worth taking into consideration.
100000! (with printing and everything) took 26 seconds on my machine with parallel execution in LINQPad.
Try this for a simpler solution:
Func<int, BigInteger> factorialAgg = n => n < 2 ? BigInteger.One
: Enumerable.Range(2, n-1)
.AsParallel()
.Aggregate(BigInteger.One, (r, i) => r * i);
var result = factorialAgg(100000);