I'm wondering about the performance/complexity of constructing BigInteger objects with the new BigInteger(String)
constructor.
Consider the following method:
public static void testBigIntegerConstruction()
{
for (int exp = 1; exp < 10; exp++)
{
StringBuffer bigNumber = new StringBuffer((int) Math.pow(10.0, exp));
for (int i = 0; i < Math.pow(10.0, exp - 1); i++)
{
bigNumber.append("1234567890");
}
String val = bigNumber.toString();
long time = System.currentTimeMillis();
BigInteger bigOne = new BigInteger(val);
System.out.println("time for constructing a 10^" + exp
+ " digits BigInteger : " + ((System.currentTimeMillis() - time))
+ " ms");
}
}
This method creates BigInteger
objects of Strings with 10^x
digits, where x=1
at the beginning, and it's increased with every iteration. It measures and outputs the time required for constructing the corresponding BigInteger
object.
On my machine (Intel Core i5 660, JDK 6 Update 25 32 bit) the output is:
time for constructing a 10^1 digits BigInteger : 0 ms
time for constructing a 10^2 digits BigInteger : 0 ms
time for constructing a 10^3 digits BigInteger : 0 ms
time for constructing a 10^4 digits BigInteger : 16 ms
time for constructing a 10^5 digits BigInteger : 656 ms
time for constructing a 10^6 digits BigInteger : 59936 ms
time for constructing a 10^7 digits BigInteger : 6227975 ms
While ignoring the lines up to 10^5 (because of possible distortions introduced by (processor) caching effects, JIT-compilation etc), we can clearly see a complexity of O(n^2) here.
Keeping in mind that every operation on a BigInteger
creates a new one due to immutability, this is a major performance penality for huge numbers.
Questions:
Did I miss something?
Why is this the case?
Is this fixed in more recent JDKs?
Are there any alternatives?
UPDATE:
I did further measurements and I can confirm the statement from some of the answers:
It seems that BigInteger
is optimized for subsequent numerical operations with the expense of higher construction costs for huge numbers which seems reasonable for me.
Simplifying from the source somewhat, it's the case because in the "traditional" String parsing loop
you have the issue that
10 * x
takes time linear in the length ofx
, unavoidably, and that length grows by more-or-less a constant factor for each digit, also unavoidably.(The actual implementation is somewhat smarter than this -- it tries to parse an
int
's worth of binary digits at a time, and so the actual multiplier in the loop is more likely 1 or 2 billion -- but yeah, it's still quadratic overall.)That said, a number with
10^6
digits is at least a googol, and that's bigger than any number I've heard of being used even for cryptographic purposes. You're parsing a string that takes two megabytes of memory. Yes, it'll take a while, but I suspect the JDK authors didn't see the point of optimizing for such a rare use case.You're actually measuring the time it takes to parse a string and create the BigInteger. Numeric operations involving BigIntegers would be a lot more efficient than this.
The O(n^2) effort is caused by the decimal to binary conversion if the
BigInteger
is specified as decimal digits.Also, 10^7 digits is a really huge number. For typical cryptographic algorithms like RSA you would deal with 10^3 to 10^4 digits. Most of the
BigInteger
operations are not optimized for such a large number of digits.