I'm using the following code to generate a unique 10-character random string of [A-Z a-z 0-9]
in Ruby:
random_code = [*('a'..'z'),*('0'..'9'),*('A'..'Z')].shuffle[0, 10].join
However, sometimes this random string does not contain a number or an uppercase character. Could you help me have a method that generates a unique random string that requires at least one number, one uppercase and one downcase character?
If you want a script to generate just some small amount of tokens (like 2, 5, 10, 100, 1000, 10 000, etc), then the best way would be to simply keep the already generated tokens in memory and retry until a new one is generated (statistically speaking, this wont take long). If this is not the case - keep reading.
After thinking about it, this problem turned out to be in fact very interenting. For brievety, I will not mention the requirement to have at least one number, capital and lower case letters, but it will be included in the final solution. Also let
all = [*'1'..'9', *'a'..'z', *'A'..'Z']
.To sum it up, we want to generate k-permutations of n elements with repetition randomly with uniqueness constraint. k = 10, n = 61 (
all.size
)Ruby just so happens to have such method, it's
Array#repeated_permutation
. So everything is great, we can just use:and pop the resulting strings one by one, right? Wrong! The problem is that the amount of possibilities happens to be:
k^n = 10000000000000000000000000000000000000000000000000000000000000 (
10**61
).Even if you had an infinetelly fast processor, you still can't hold such amount of data, no matter if this was the count of complex objects or simple bits.
The opposite would be to generate random permutations, keep the already generated in a set and make checks for inclusion before returning the next element. This is just delaying the innevitable - not only you would still have to hold the same amount of information at some point, but as the number of generated permutations grows, the number of tries required to generate a new permutation diverges to infinity.
As you might have thought, the root of the problem is that randomness and uniqueness hardly go hand to hand.
Firstly, we would have to define what we would consider as random. Judging by the amount of nerdy comics on the subject, you could deduce that this isn't that black and white either.
An intuitive definition for a random program would be one that doesn't generate the tokens in the same order with each execution. Great, so now we can just take the first n permutations (where
n = rand(100)
), put them at the end and enumerate everything in order? You can sense where this is going. In order for a random generation to be considered good, the generated outputs of consecutive runs should be equaly distributed. In simpler terms, the probability of getting any possible output should be equal to 1 / #__all_possible_outputs__.Now lets explore the boundaries of our problem a little:
The number of possible k-permutations of n elements without repetition is:
n!/(n-k)! = 327_234_915_316_108_800 (
(61 - 10 + 1).upto(61).reduce(:*)
)Still out of reach. Same goes for
The number of possible full permutations of n elements without repetition:
n! = 507_580_213_877_224_798_800_856_812_176_625_227_226_004_528_988_036_003_099_405_939_480_985_600_000_000_000_000 (
1.upto(61).reduce(:*)
)The number of possible k-combinations of n elements without repetition:
n!/k!(n-k)! = 90_177_170_226 (
(61 - 10 + 1).upto(61).reduce(:*)/1.upto(10).reduce(:*)
)Finally, where we might have a break through with full permutation of k elements without repetition:
k! = 3_628_800 (
1.upto(10).reduce(:*)
)~3.5m isn't nothing, but at least it's reasonably computable. On my personal laptop
k_permutations = 0.upto(9).to_a.permutation.to_a
took 2.008337 seconds on average. Generally, as computing time goes, this is a lot. However, assuming that you will be running this on an actual server and only once per application startup, this is nothing. In fact, it would even be reasonable to create some seeds. A singlek_permutations.shuffle
took 0.154134 seconds, therefore in about a minute we can acquire 61 random permutations:k_seeds = 61.times.map { k_permutations.shuffle }.to_a
.Now lets try to convert the problem of k-permutations of n elements without repetition to solving multiple times full k-permutations without repetitions.
A cool trick for generating permutations is using numbers and bitmaps. The idea is to generate all numbers from 0 to 2^61 - 1 and look at the bits. If there is a
1
on positioni
, we will use theall[i]
element, otherwise we will skip it. We still didn't escape the issue as 2^61 = 2305843009213693952 (2**61
) which we can't keep in memory.Fortunatelly, another cool trick comes to the rescue, this time from number theory.
In other words:
Now actually, how random is that? As it turns out - the more common divisors shared by m and the selected m numbers, the less evenly distributed the sequence is. But we are at luck here - 61^2 - 1 is a prime number (also called Mersenne prime). Therefore, the only divisors it can share are 1 and 61^2 - 1. This means that no matter what power we choose, the positions of the numbers 0 and 1 will be fixed. That is not perfect, but the other 61^2 - 3 numbers can be found at any position. And guess what - we don't care about 0 and 1 anyway, because they don't have 10
1
s in their binary representation!Unfortunatelly, a bottleneck for our randomness is the fact that the bigger prime number we want to generate, the harder it gets. This is the best I can come up with when it comes to generating all the numbers in a range in a shuffled order, without keeping them in memory simultaneously.
So to put everything in use:
Note that this will solve only the problem of k-permutations of n elements without repetition. I still haven't thought of a way to add repetition.
DISCLAIMER: The following code comes with no guarantees of any kind, explicit or implied. Its point is to further express the author's ideas, rather than be a production ready solution:
EDIT: it turned out that our constraints eliminate too much possibilities. This causes
@all_numbers_generator
to take too much time testing and skipping numbers. I will try to think of a better generator, but everything else remains valid.The old version that generates tokens with uniqueness constraint on the containing characters:
[Edit: The above reflects a misunderstanding of the question. I'll leave it, however. To have no characters appear more than once:
down.sample.shift
(etc.) would have been more compact thanextract1
, but the inefficiency was just too much to bear.If you do not want to repeat random strings, simply keep a list of the ones you generate. If you generate another that is in the list, discard it and generate another. It's pretty unlikely you'll have to generate any extra ones, however. If, for example, you generate 100 random strings (satisfying the requirement of at least one lowercase letter, uppercase letter and digit), the chances that there will be one or more duplicate strings is about one in 700,000:
where
t = C(62,10)
andC(62,10)
is defined below.An alternative
There is a really simple way to do this that turns out to be pretty efficient: just sample without replacement until a sample is found that meets the requirement of at least lowercase letter, one uppercase letter and one digit. We can do that as follows:
How many samples must we reject, on average, before finding a "good" one? It turns out (see below if you are really, really interested) that the probability of getting a "bad" string (i.e, selecting 10 characters at random from the 62 elements of
all
, without replacement, that has no lowercase letters, no uppercase letters or no digits, is only about 0.15. (15%). That means that 85% of the time no bad samples will be rejected before a good one is found.It turns out that the expected number of bad strings that will be sampled, before a good string is sampled, is:
The following shows how the above probability was derived, should anyone be interested.
Let
n_down
be the number of ways a sample of 10 can be drawn that has no lowercase letters:where (the binomial coefficient)
C(36,10)
equals the number of combinations of 36 "things" that can be "taken" 10 at a time, and equals:Similarly,
and
We can add these three numbers together to obtain:
This is almost, but not quite, the number of ways to draw 10 characters, without replacement, that contains no lowercase letters characters, uppercase letters or digits. "Not quite" because there is a bit of double-counting going on. The necessary adjustment is as follows:
To obtain the probability of drawing a sample of 10 from a population of 62, without replacement, that has no lowercase letter, no uppercase letter or no digit, we divide this number by the total number of ways 10 characters can be drawn from 62 without replacement:
Use 'SafeRandom' Gem GithubLink
It will provide the easiest way to generate random values for Rails 2, Rails 3, Rails 4, Rails 5 compatible.
Here you can use the strong_string method to generate a strong combination of string ( ie combination of the alphabet(uppercase, downcase), number, and symbols