Generate unique random strings in plpgsql

2019-07-23 17:02发布

问题:

I am trying to write a function to create unique random tokens of variable length. However, I am stumped by the plpgsql syntax. My intention is to create a function which

  • Takes a table and column as input
  • Generates a random string of a given length, with a given set of characters
  • Checks if the string is already in the colum
  • If so (and this is expected to be rare), simply generate a new random string.
  • Otherwise, return the random string

My current attempt looks like this:

CREATE FUNCTION random_token(_table TEXT, _column TEXT, _length INTEGER) RETURNS text AS $$
DECLARE
  alphanum CONSTANT text := 'abcdefghijkmnopqrstuvwxyz23456789';
  range_head CONSTANT integer := 25;
  range_tail CONSTANT integer := 33;
  random_string text;
BEGIN
  REPEAT
    SELECT substring(alphanum from trunc(random() * range_head + 1)::integer for 1) ||
      array_to_string(array_agg(substring(alphanum from trunc(random() * range_tail + 1)::integer for 1)), '')
      INTO random_string FROM generate_series(1, _length - 1);
  UNTIL random_string NOT IN FORMAT('SELECT %I FROM %I WHERE %I = random_string;', _column, _table, _column)
  END REPEAT;
  RETURN random_string;
END
$$ LANGUAGE plpgsql;

However, this doesn't work, and gives me a not very helpful error:

DatabaseError: error 'ERROR: syntax error at or near "REPEAT"

I have tried a number of variations, but without knowing what the error in the syntax is I am stumped. Any idea how to fix this function?

回答1:

There is no repeat statement in plpgsql. Use simple loop.

CREATE OR REPLACE FUNCTION random_token(_table TEXT, _column TEXT, _length INTEGER) RETURNS text AS $$
DECLARE
  alphanum CONSTANT text := 'abcdefghijkmnopqrstuvwxyz23456789';
  range_head CONSTANT integer := 25;
  range_tail CONSTANT integer := 33;
  random_string text;
  ct int;
BEGIN
  LOOP
    SELECT substring(alphanum from trunc(random() * range_head + 1)::integer for 1) ||
      array_to_string(array_agg(substring(alphanum from trunc(random() * range_tail + 1)::integer for 1)), '')
      INTO random_string FROM generate_series(1, _length - 1);
    EXECUTE FORMAT('SELECT count(*) FROM %I WHERE %I = %L', _table, _column, random_string) INTO ct;
    EXIT WHEN ct = 0;
  END LOOP;
  RETURN random_string;
END
$$ LANGUAGE plpgsql;

Note, random_string should be a parameter to format().

Update. According to the accurate hint from Abelisto, this should be faster for a large table:

DECLARE
  dup boolean;
...
    EXECUTE FORMAT('SELECT EXISTS(SELECT 1 FROM %I WHERE %I = %L)', _table, _column, random_string) INTO dup;
    EXIT WHEN NOT dup;
...


回答2:

This is almost certainly not what you want. When you say, "checks if the string is already in the column" you're not referring to something that looks unique, you're referring to something that actually is UNIQUE.

Instead, I would point you over this answer I gave about UUIDs.