Generate unique random numbers in Postgresql with

2019-01-26 10:49发布

问题:

I need to generate unique random numbers in Postgresql with a fixed length of 13 digits. I've found a similar thread where was used a sequence encrypted using "pseudo_encrypt", but the returned number was not with a fixed length.

So, what i need is: get an encrypted random sequence with a fixed length of 13 digits where the min value is 0000000000001 and a max value is 9999999999999.

Is it possible? If start with the zeros in front is not possible is not a big problem (i think), i can set them programmatically during the reading from the db, but would be great if Postgresql can do it by itself.

-- EDIT --

After have realized some useful things i must change the question in order to explain better what i need:

I need to generate unique random numbers (bigint) in Postgresql with a fixed max length of 13 digits. Actually i'm trying to use pseudo_encrypt function (64 bit), but the returned number obviously is not with a fixed max length of 13, in the 32 bit case the max length is 10 digits (int), and for the 64 bit is 19 (bigint).

So, how to get an encrypted random sequence with a fixed max length of 13 digits, where the min value is 1 and the max value is 9999999999999 ?

Is it possible to modify the 64 bit pseudo_ecrypt function in order to get this result? Or if is not possible, there are other methods to get an unique sequence with this requirements?

Pseudo Encrypt function (64bit)

CREATE OR REPLACE FUNCTION pseudo_encrypt(VALUE bigint) returns bigint   AS $$
DECLARE
l1 bigint;
l2 bigint;
r1 bigint;
r2 bigint;
i int:=0;
BEGIN
l1:= (VALUE >> 32) & 4294967295::bigint;
r1:= VALUE & 4294967295;
WHILE i < 3 LOOP
    l2 := r1;
    r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767*32767)::int;
    l1 := l2;
    r1 := r2;
    i := i + 1;
END LOOP;
RETURN ((l1::bigint << 32) + r1);
END;
$$ LANGUAGE plpgsql strict immutable;

回答1:

Tweaking the existing function for N < 64 bits values

It's relatively simple to tweak the bigint variant to reduce the output to 2^N values, where N is even, and less than 64.

To get 13 decimal digits, consider the maximum N for which 2^N has 13 digits. That's N=42, with 2^42=4398046511104.

The algorithm works by breaking the input value into two halves with an equal number of bits, and make them flow through the Feistel network, essentially XOR'ing with the result of the round function and swapping halves at each iteration.

If at every stage of the process, each half is limited to 21 bits then the result combining both halves is guaranteed not to exceed 42 bits.

So here's my proposed variant:

CREATE OR REPLACE FUNCTION pseudo_encrypt42(VALUE bigint) returns bigint
 AS $$
DECLARE
  l1 bigint;
  l2 bigint;
  r1 bigint;
  r2 bigint;
  i int:=0;
  b21 int:=(1<<21)-1; -- 21 bits mask for a half-number => 42 bits total
BEGIN
  l1:= VALUE >> 21;
  r1:= VALUE & b21;
  WHILE i < 3 LOOP
    l2 := r1;
    r2 := l1 # (((((1366*r1+150889)%714025)/714025.0)*32767*32767)::int & b21);
    l1 := l2;
    r1 := r2;
    i := i + 1;
  END LOOP;
  RETURN ((l1::bigint << 21) + r1);
END;
$$ LANGUAGE plpgsql strict immutable;

The input must be less than (2^42)-1, otherwise the outputs will collide , as pseudo_encrypt42(x) = pseudo_encrypt42(x mod 2^42).

What can be done about the missing numbers between 2^42 and 10^13 ?

2^42 - 10^13 = 5601953488896 so that's quite a lot of missing numbers. I don't know how to help with that in a single pass with the Feistel network. One workaround that might be acceptable, though, is to generate another set of unique values in 0..M and add 2^42 to them, so there's no risk of collision.

This another set could be obtained by the same function, just with the offset added. 4398046511104 + pseudo_encrypt42(x) is guaranteed to be between 4398046511104 and 2*4398046511104 = 8796093022208 unique values so that's closer to the goal. The same technique could be applied with several other ranges, not even necessarily of the same size.

However this workaround degrades the random-looking behavior , as instead of having a single output range where every number can be between 0 and X, you'd get N distinct output ranges of X/N numbers. With several distinct partitions like that, it's easy to guess in what partition the output will be, just not what value inside the partition.