Most Efficient Way to… Unique Random String

2020-07-19 03:33发布

问题:

I need to efficently insert a 5 character RANDOM string into a database while also ensuring that it is UNIQUE. Generating the random string is not the problem, but currently what I am doing is generating the string and then checking the DB if it exists already... if it does, I start over.

Is there a more efficient way to do this process?

Please note, I do NOT want to use GUID or anything else that is more than 5 Characters.... I MUST stick to 5 Characters.

PS: I don't think it makes a difference, but my strings are all case sensitive.

Here is the "Random String" portion

    Public Function GetRandomNumbers(ByVal numChars As Integer) As String
    Dim chars As String() = { _
     "A", "B", "C", "D", "E", "F", _
     "G", "H", "I", "J", "K", "L", _
     "M", "N", "O", "P", "Q", "R", _
     "S", "T", "U", "V", "W", "X", _
     "Y", "Z", "0", "1", "2", "3", _
     "4", "5", "6", "7", "8", "9", _
     "a", "b", "c", "d", "e", "f", _
     "g", "h", "i", "j", "k", "l", _
     "m", "n", "o", "p", "q", "r", _
     "s", "t", "u", "v", "w", "x", _
     "y", "z"}
    Dim rnd As New Random()
    Dim random As String = String.Empty
    Dim i As Integer = 0
    While i < numChars
        random += chars(rnd.[Next](0, 62))
        System.Math.Max(System.Threading.Interlocked.Increment(i), i - 1)
    End While
    Return random
End Function

回答1:

Create a table with a big pool of 5-character strings that are added in sequence (so they are unique), and have a GUID as their primary key. Add a column to indicate whether they are used or not.

When you need a new number, you select top 1 from the pool, order by the guid (so it becomes random), and set the result as "spent".



回答2:

You could generate a GUID and only use the first 5 characters?



回答3:

Is randomness more important, or is uniqueness more important? -- note that I said "more" important; I get the fact that you need both.

If randomness is more important, then you're going to need some way to track historical values. The database itself (with an appropriate index) is going to be the best way to do this.

If uniqueness is more important, then simply use a counter and zero-pad it to five digits. This will, of course, limit you to 100,000 rows, so you could alternatively use a counter and a transformation into character space (eg, 1 = "A", 2 = "B", 27 = "AA", and so on).



回答4:

There is a method to pick unused unique words by random, but it's probably not going to be any better than what you are doing now.

The principle is that you determine which permutations of the words that are unused, generate a random number based on how many unused permuations there are, and pick that one.

If you for example would use a word with three characters, and only the character 0 and 1, there are eight possible permutations. If you already used the combinations "010" and "100", you would get something that looks like this:

PI = permutation index
UI = unused permutation index

No. PI UI
----------
000 0  0
001 1  1
010 2  -
011 3  2
100 4  -
101 5  3
110 6  4
111 7  5

To pick an unused permutation, you simply generate a random number from 0 to 5, and pick the corresponding permutation.

Keeping a list of all possible permuations is of course not practical, so you would need a function that can determine the permutation index from the string, and one function that can determine the string from the permutation index.

Also, to determine which permutations are unused, you have to check which are used, so you still have to query the table at some point.



回答5:

If you are inserting the string to an existing, populated, table then you will always need to check if the string doesn't exist there (it doesn't have to be an explicit SELECT). You can either to it manually, or have an UNIQUE constraint on the column and let the database to do it. So if the database returns an error because the string is already there, generate another one.

Note that if you have an empty table and want to populate it with multiple random strings, it's a different problem.



回答6:

I think you should stick to your origional idea. Putting a unique constraint on the index and letting the database check/report dupes for you would be fairly effecient method of dupe checking but this assumption depends on some information not provided like the number of rows and likelyhood of encountering dupes with randomly selected data.

Fully pre-populating a unique sequence pool with your parameters requires a 459 million row table.

You could use a bloom filter to load managable statistics into a database or main memory and avoid dupes but depending on the row count and filter configuration this may lead to saturation of the filter when the row count is an appreciable percentage of the 459 million limit.. Since the filter can report false positives you should work to ensure that you don't get into a situation where your system is stuck trying permutations that pass the filter approaching forever.



回答7:

As you know how long your word shall be, why not a tree-based approach? (Let's call it randomised tree walk)

Say your word has n characters. Generate a list of all symbols s in S and relate a counter for each symbol and possible position in the string, essentially a matrix M of dimensions s times n. Now roll your dice and choose the first letter and lookup M(s,1). If M(s,1) is larger or equal the number of possible words starting with s, roll again. Otherwise increment M(s,1).

Repeat this for every letter 1 till n.

Should be pretty fast until you have used up to many words.