new BigInteger(String) performance / complexity

2019-06-15 11:04发布

问题:

I'm wondering about the performance/complexity of constructing BigInteger objects with the new BigInteger(String) constructor.

Consider the following method:

  public static void testBigIntegerConstruction()
  {
    for (int exp = 1; exp < 10; exp++)
    {
      StringBuffer bigNumber = new StringBuffer((int) Math.pow(10.0, exp));
      for (int i = 0; i < Math.pow(10.0, exp - 1); i++)
      {
        bigNumber.append("1234567890");
      }

      String val = bigNumber.toString();
      long time = System.currentTimeMillis();
      BigInteger bigOne = new BigInteger(val);
      System.out.println("time for constructing a 10^" + exp
          + " digits BigInteger : " + ((System.currentTimeMillis() - time))
          + " ms");
    }
  }

This method creates BigInteger objects of Strings with 10^x digits, where x=1 at the beginning, and it's increased with every iteration. It measures and outputs the time required for constructing the corresponding BigInteger object.

On my machine (Intel Core i5 660, JDK 6 Update 25 32 bit) the output is:

time for constructing a 10^1 digits BigInteger : 0 ms
time for constructing a 10^2 digits BigInteger : 0 ms
time for constructing a 10^3 digits BigInteger : 0 ms
time for constructing a 10^4 digits BigInteger : 16 ms
time for constructing a 10^5 digits BigInteger : 656 ms
time for constructing a 10^6 digits BigInteger : 59936 ms
time for constructing a 10^7 digits BigInteger : 6227975 ms

While ignoring the lines up to 10^5 (because of possible distortions introduced by (processor) caching effects, JIT-compilation etc), we can clearly see a complexity of O(n^2) here. Keeping in mind that every operation on a BigInteger creates a new one due to immutability, this is a major performance penality for huge numbers.

Questions:

  • Did I miss something?

  • Why is this the case?

  • Is this fixed in more recent JDKs?

  • Are there any alternatives?

UPDATE:

I did further measurements and I can confirm the statement from some of the answers:
It seems that BigInteger is optimized for subsequent numerical operations with the expense of higher construction costs for huge numbers which seems reasonable for me.

回答1:

Simplifying from the source somewhat, it's the case because in the "traditional" String parsing loop

for each digit y from left to right:
  x = 10 * x + y

you have the issue that 10 * x takes time linear in the length of x, unavoidably, and that length grows by more-or-less a constant factor for each digit, also unavoidably.

(The actual implementation is somewhat smarter than this -- it tries to parse an int's worth of binary digits at a time, and so the actual multiplier in the loop is more likely 1 or 2 billion -- but yeah, it's still quadratic overall.)

That said, a number with 10^6 digits is at least a googol, and that's bigger than any number I've heard of being used even for cryptographic purposes. You're parsing a string that takes two megabytes of memory. Yes, it'll take a while, but I suspect the JDK authors didn't see the point of optimizing for such a rare use case.



回答2:

The O(n^2) effort is caused by the decimal to binary conversion if the BigInteger is specified as decimal digits.

Also, 10^7 digits is a really huge number. For typical cryptographic algorithms like RSA you would deal with 10^3 to 10^4 digits. Most of the BigInteger operations are not optimized for such a large number of digits.



回答3:

You're actually measuring the time it takes to parse a string and create the BigInteger. Numeric operations involving BigIntegers would be a lot more efficient than this.