Converting a big integer to decimal string

2019-01-26 20:28发布

At the risk of having this question voted as a duplicate, or even to have it closed, I had this question has come up.

Background

In "normal" data types such as int, long long, etc..., to convert from the binary numeric value to a decimal string, you would do the following (in pseudo code):

Set length = 0
Set divisor to largest base10 value the data type will hold (Divisor).
  Loop
    Divide number in question by divisor.
    Place result in a string at position length.
    Increment the length by 1.
    Divide the divisor by 10.
Reverse the string.
Print the string.

The actual implementation in (most) any language is quite trivial.

The Problem

The issue that I am encountering with the above method is that with big integer numbers (also known as arbitrary precision arithmetic), there is no largest base 10 value to start with. So the question is "How do you initialize the divisor to the largest possible base10 value if there is no way to know what that value is?"

What I Have Tried

Still trying to draft a solution.

Research

Some of the links that I have found here include the following:

Convert a "big" Hex number (string format) to a decimal number (string format) without BigInteger Class

C: print a BigInteger in base 10

Fastest way to convert a BigInteger to a decimal (Base 10) string?

Convert a "big" Hex number (string format) to a decimal number (string format) without BigInteger Class

A Google search turned up other things, but nothing that specifically answers my question.

Ideas

One method that I think that might work is as follows (in pseudo code):

Define p_divisor as previous divisor.
Set divisor = 1
  Loop:
    if divisor < dividend
      then
        Set p_divisor = divisor
        divisor = divisor * 10
      else
        end loop
  Loop:
    Divide number in question by divisor.
    Place result in a string at position length.
    Increment the length by 1.
    Divide the divisor by 10.
    if divisor == 1 then end loop
Reverse the string.
Print the string.

Would this be the correct way? I have a big integer library up and working (including multiplication and division) so it wouldn't be that hard to pull this off. The big issue that I see with this method is performance, because you have to run a multiplication sequence to get the initial divisor, then you have to divide twice for each base10 position. One for the actual division, and the other for the divisor.

4条回答
时光不老,我们不散
2楼-- · 2019-01-26 20:48

The accepted answer already provides you with a simple way to do this. That works fine and gives you a nice result. However, if you really need to convert large values to a string, there is a better way.

I will not go into details, because my solution is written in Delphi, which many readers can't easily read, and it is pretty long (several functions in 100+ lines of code, using yet other functions, etc. which can not be explained in a simple answer, especially because the conversion handles some different number bases differently).

But the principle is to divide the number into two almost equal size halves, by a number which is a power of 10. To convert these, recursivley cut them in two smaller parts again, by a smaller power of 10, etc. until the size of the parts reaches some kind of lower limit (say, 32 bit), which you then finally convert the conventional way, i.e. like in the accepted answer.

The partial conversions are then "concatenated" (actually, the digits are placed into the single buffer at the correct address directly), so at the end, you get one huge string of digits.

This is a bit tricky, and I only mention it for those who want to investigate this for extremely large numbers. It doesn't make sense for numbers with fewer than, say, 100 digits.

This is a recursive method, indeed, but not one that simply divides by 10.

The size of the buffer can be precalculated, by doing something like

bufSize = myBigInt.bitCount() * Math.log10(2) + some_extra_to_be_sure;

I use a precalculated table for the different number bases, but that is an implementation detail.

For very large numbers, this will be much faster than a loop that repeatedly divides by 10, especially since that way, the entire number must be divided by 10 all the time, and it only gets smaller very slowly. The divide-and-conquer algorithm only divides ever smaller numbers, and the total number of (costly) divisions to cut the parts is far lower (log N instead of N, is my guess). So fewer divisions on (on the average) much smaller numbers.

cf. Brent, Zimmermann, "Modern Computer Arithmetic", algorithm 1.26

My code and explanations can be found here, if you want to see how it works: BigIntegers unit

查看更多
我命由我不由天
3楼-- · 2019-01-26 20:48

Would this be the correct way?

2nd method does not work for all integer values in C. if divisor < dividend relies on creating divisor as a power of 10 greater (or equal) than the dividend. Since most integer systems have a finite range, creating a power of 10 greater (or equal) than dividend when dividend == INTEGER_MAX is not possible. (unless INTEGER_MAX is a power of 10).


A recursive method works by performing repeated division by 10 and deferring the the digit assignment until the more significant digits are determined. This approach works well when the size of the destination buffer is unknown, yet adequate.

The below handles signed int and works for INT_MIN too without undefined behavior.

// Return location of next char to write
// Note: value is expected to be <= 0
static char *itoa_helper(char *s, int value) {
  if (value/10) {
    s = itoa_helper(s, value/10);
  }
  *s = '0' - value % 10;  // C99
  return s+1;
}

void itoa(int n, char *s) {
  if (n < 0) {
    *s++ = '-';
  } else {
    n = -n;
  }
  *itoa_helper(s, n) = '\0';
}

#define INT_SIZEMAX  ((CHAR_BIT*sizeof(int) - 1)*28/93 + 3)
char buf[INT_SIZEMAX];
itoa(INT_MIN, buf);

Rather than converting negative numbers to positive ones, this code does the opposite as -INT_MIN fails on most systems.

查看更多
一夜七次
4楼-- · 2019-01-26 20:57

I came across similar problem and did not find any solution to my liking, so came up with my owm. The idea is to convert your BigInt using whatever base to another BigInt with the base of power of 10, as large as possible but still smaller then your current base. That you can just convert by "digit" using system calls, and concatenate the result. So no explicit division ever involved, only hidden in system library functions. Still the overall complexity is quadratic (just like with the other division based solutions).

friend std::ostream& operator<<(std::ostream& out, const BigInt_impl& x){
    using Big10 = BigInt_impl<char32_t, uint64_t, 1000000000>; // 1e9 is the max power of 10 smaller then BASE
    auto big10 = Big10(0);
    auto cm = Big10(1);
    for(size_t i = 0; i < x.digits.size(); ++i, cm *= BASE){
        big10 += cm*x.digits[i];
    }
    out << big10.digits.back();
    for(auto it = next(big10.digits.rbegin()); it != big10.digits.rend(); ++it){ 
        out << std::setfill('0') << std::setw(9) << *it;
    }
    return out;
}

Watch out for the magic constant 1e9 in this solution - this is just for my case of BASE = 2^32. Was lazy to do it properly.

(and sorry, for C++, I just realized that qustion was about C, but still would like to leave the code here, maybe as an illustration of idea)

查看更多
疯言疯语
5楼-- · 2019-01-26 21:00

One (fairly common) way to do this, whether for big integer or normal integer types, is to repeatedly divide the number by 10, saving the remainder as the next digit (starting with the least significant). Keep going until the number reaches zero. Since the first digit found is the least significant, you may need to reverse the string at the end, or build it in reverse as you go.

An example using ordinary unsigned int might look like:

void printUInt(unsigned x) {
  char buf[(sizeof(x) * CHAR_BIT) / 3 + 2]; // slightly oversize buffer
  char *result  = buf + sizeof(buf) - 1; // index of next output digit

  // add digits to result, starting at 
  //   the end (least significant digit)

  *result = '\0'; // terminating null
  do {
    *--result = '0' + (x % 10);  // remainder gives the next digit
    x /= 10;
  } while (x); // keep going until x reaches zero

  puts(result);
}

The process is pretty much the same for a big integer -- though it would be best to do the division and find the remainder in one step if you can.

The above example builds the string from the end of the buffer (so result ends up pointing in the middle of the buffer somewhere), but you could also build it from the start and reverse it afterward.

You can estimate the size needed for the output if you can determine the number of bits used in your original number (about 1 additional digit per 3 bits -- slightly less).

查看更多
登录 后发表回答