How can I pick a random value between 0 and a bigi

2019-06-15 07:40发布

I have a combinatorics problem for which I want to be able to pick an integer at random between 0 and a big integer.


Inadequacies of my current approach

Now for regular integers I would usually write something like int rand 500; and be done with it.

But for big integers, it looks like rand isn't meant for this.

Using the following code, I ran a simulation of 2 million calls to rand $bigint:

$ perl -Mbigint -E 'say int rand 1230138339199329632554990773929330319360000000 for 1 .. 2e6' > rand.txt

The distribution of the resultant set is far from desirable:

  • 0 (56 counts)
  • magnitude 1e+040 (112 counts)
  • magnitude 1e+041 (1411 counts)
  • magnitude 1e+042 (14496 counts)
  • magnitude 1e+043 (146324 counts)
  • magnitude 1e+044 (1463824 counts)
  • magnitude 1e+045 (373777 counts)

So the process was never able to choose a number like 999, or 5e+020, which makes this approach unsuitable for what I want to do.

It looks like this has something to do with the arbitrary precision of rand, which never goes beyond 15 digits in the course of my testing:

$ perl -E 'printf "%.66g", rand'
0.307037353515625

How can I overcome this limitation?

My initial thought is that maybe there is a way to influence the precision of rand, but it feels like a band-aid to a much bigger problem (i.e. the inability of rand to handle big integers).

In any case, I'm hoping someone has walked down this path before and knows how to remedy the situation.

3条回答
We Are One
2楼-- · 2019-06-15 08:26

I was looking at this problem from the wrong angle

The bins are not the same size. Each bin is 10 times the size of the previous one. To put this in perspective, there are 10,000 possible integers at magnitude 1e+44 for every integer with magnitude 1e+40.

The probability of finding any number with magnitude 1e+20 for the bigint at 1e+45 is less than 0.00000 00000 00000 00000 001 %.

Forget needles in haystacks, this is more like finding a needle in a quasar!

查看更多
Viruses.
3楼-- · 2019-06-15 08:39

(Converted from my comment)

A more theoretical-driven approach would be using multiple calls to the PRNG to create enough random-bits for your number to sample. Care has to be taken, if the number of bits produced by some PRNG is not equal to the number of bits needed as outlined below!

Pseudocode

  • Calculate the bits needed to represent your number: n_needed_bits
  • Check the size of bits returned by your PRNG: n_bits_prng
  • Calculate the number of samples needed: needed_prng_samples = ceil(n_needed_bits / n_bits_prng)
  • While true:
    • Sample needed_prng_samples (calls to PRNG) times & concatenate all the bits obtained
    • Check if the resulting number is within your range
    • Yes?: return number (finished)
    • No?: do nothing (loop continues; will resample all components again!)

Remarks

  • This is a form of acceptance-sampling / rejection-sampling
  • The approach is a Las-vegas type of algorithm: the runtime is not bounded in theory
    • The number of loops needed is in average: n_possible-sample-numbers-of-full-concatenation / n_possible-sample-numbers-within-range
  • The complete resampling (if result not within range) according to the rejection-method is giving access to more formal-analysis of non-bias / uniformity and is a very important aspect for this approach
  • Of course the classic assumptions in regards to PRNG-output are needed to make this work.
    • If the PRNG for example has some non-uniformity in regards to low-bits / high-bits (as often mentioned), this will have an effect of the output above
查看更多
混吃等死
4楼-- · 2019-06-15 08:42

An approach can be to cut string representation of the number into chunks, a boolean ($low) initialized is false while first random draws are equal to upper bound.

EDIT: added some explanations following comment

# first argument (in) upper bound
# second argument (in/out) is lower (false while random returns upper bound, after it remains true)
sub randhlp {
    my($upp)=@_;
    my $l=length $upp;
    # random number less than
    # - upper bound if islower is false
    # - 9..99 otherwise
    my $x=int rand ($_[1] ? 10**$l : $upp+1);
    if ($x<$upp) {
        $_[1]=1;
    }
    # left padding with 0
    return sprintf("%0*d",$l,$x);
}

# returns a random number less than argument (numeric string)
sub randistr {
    my($n)=@_;
    $n=~/^\d+$/ or die "invalid input not numeric";
    $n ne "0" or die "invalid input 0";
    my($low,$x);
    do {
        undef $x;
        # split string by chunks of 6 characters
        # except last chunk which has 1 to 6 characters
        while ($n=~/.{1,6}/g) {
            # concatenate random results
            $x.=randhlp($&,$low)
        }
    } while ($x eq $n);
    $x=~s/^0+//;
    return $x;
}

The test

for ($i=0;$i<2e6;++$i) {
    $H{length(randistr("1230138339199329632554990773929330319360000000"))}+=1;
}

print "$_ $H{$_}\n" for sort keys %H;

Returns

39 4
40 61
41 153
42 1376
43 14592
44 146109
45 1463301
46 374404
查看更多
登录 后发表回答