SQL Server Generate UNIQUE random string

2020-08-05 11:34发布

问题:

With help of Zohar Answer, I got SQL function to generate random string but I am facing the problem with duplicate.

Query

Create FUNCTION [dbo].[MaskGenerator]
(    
    @Prefix nvarchar(4000), -- use null or an empty string for no prefix    
    @suffix nvarchar(4000), -- use null or an empty string for no suffix    
    @MinLength int, -- the minimum length of the random part    
    @MaxLength int, -- the maximum length of the random part    
    @Count int, -- the maximum number of rows to return. Note: up to 1,000,000 rows           
    @CharType tinyint -- 1, 2 and 4 stands for lower-case, upper-case and digits. 
                      -- a bitwise combination of these values can be used to generate all possible combinations: 
                      -- 3: lower and upper, 5: lower and digis, 6: upper and digits, 7: lower, upper nad digits
)
RETURNS TABLE
AS 
RETURN 

-- An inline tally table with 1,000,000 rows
WITH E1(N) AS (SELECT N FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) V(N)), -- 10
     E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
     E3(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
     Tally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY @@SPID) FROM E3 a, E2 b) --1,000,000 

SELECT TOP(@Count)  N As Number, 
        CONCAT(@Prefix, (
        SELECT  TOP (Length) 
                -- choose what char combination to use for the random part
                CASE @CharType 
                    WHEN 1 THEN LOWER
                    WHEN 2 THEN UPPER
                    WHEN 3 THEN IIF(Rnd % 2 = 0, LOWER, UPPER)
                    WHEN 4 THEN Digit
                    WHEN 5 THEN IIF(Rnd % 2 = 0, LOWER, Digit)
                    WHEN 6 THEN IIF(Rnd % 2 = 0, UPPER, Digit)
                    WHEN 7 THEN 
                        CASE Rnd % 3
                            WHEN 0 THEN LOWER
                            WHEN 1 THEN UPPER
                            ELSE Digit
                        END
                END
        FROM Tally As T0  
        -- create a random number from the guid using the GuidGenerator view
        CROSS APPLY (SELECT ABS(CHECKSUM(NewGuid)) As Rnd FROM GuidGenerator) AS RAND
        CROSS APPLY
        (
            -- generate a random lower-case char, upper-case char and digit
            SELECT  CHAR(97 + Rnd % 26) As LOWER, -- Random lower case letter
                    CHAR(65 + Rnd % 26) As UPPER,-- Random upper case letter
                    CHAR(48 + Rnd % 10) As Digit -- Random digit
        ) AS Chars
        WHERE  T0.N <> -T1.N -- Needed for the subquery to get re-evaluated for each row
        FOR XML PATH('') 
        ), @Suffix) As RandomString
FROM Tally As T1 
CROSS APPLY
(
    -- Select a random length between @MinLength and @MaxLength (inclusive)
    SELECT TOP 1 N As Length
    FROM Tally As T2
    CROSS JOIN GuidGenerator 
    WHERE T2.N >= @MinLength
    AND T2.N <= @MaxLength
    AND T2.N <> t1.N
    ORDER BY NewGuid
) As Lengths;

Above function will provide the random string based on its parameter. For example below query will generate 100 random strings with formation of Test_Product_. the result sets having duplicate values which needs to be ignore. I have tried applying row_number but its slow down the query performance also requesting count is not coming.

SELECT * FROM dbo.MaskGenerator('Test_Product_',null,1,4,100,4) ORDER BY 2

I have made fiddle demo here : SQL Fiddle and my attempt also here

回答1:

Basically, this is an effect of the birthday problem.
The best solution I can offer as of now is to generate twice as many random strings you need, then select top 100 distinct values from them:

SELECT TOP 100 RandomString, ROW_NUMBER() OVER(ORDER BY @@SPID) As Number
FROM 
(
  SELECT DISTINCT RandomString 
  FROM dbo.MaskGenerator('Test_Product_',null,1,4,200,4)
) As Rnd
ORDER BY RandomString

This might seem like a waist since you're generating twice as many random strings as you need, However:

  1. I'm not sure that's actually the case. The query optimizer might just stop execution once you have 100 distinct values.

  2. Performance tests I've done for this function (on a relatively strong SQL Server 2016) shows it is lightning-fast, at least with a small number of strings:

    • Generating 200 strings average around 23 milliseconds.
    • Generating 2000 strings average around 55 milliseconds.
    • Generating 100,000 strings average around 2.8 seconds.

Generating 1 million strings, however, average around 30 seconds.