Unique random string with alphanumberic required i

2019-09-12 08:31发布

I'm using the following code to generate a unique 10-character random string of [A-Z a-z 0-9] in Ruby:

random_code = [*('a'..'z'),*('0'..'9'),*('A'..'Z')].shuffle[0, 10].join

However, sometimes this random string does not contain a number or an uppercase character. Could you help me have a method that generates a unique random string that requires at least one number, one uppercase and one downcase character?

标签: ruby random
3条回答
Melony?
2楼-- · 2019-09-12 08:43

If you want a script to generate just some small amount of tokens (like 2, 5, 10, 100, 1000, 10 000, etc), then the best way would be to simply keep the already generated tokens in memory and retry until a new one is generated (statistically speaking, this wont take long). If this is not the case - keep reading.


After thinking about it, this problem turned out to be in fact very interenting. For brievety, I will not mention the requirement to have at least one number, capital and lower case letters, but it will be included in the final solution. Also let all = [*'1'..'9', *'a'..'z', *'A'..'Z'].

To sum it up, we want to generate k-permutations of n elements with repetition randomly with uniqueness constraint. k = 10, n = 61 (all.size)

Ruby just so happens to have such method, it's Array#repeated_permutation. So everything is great, we can just use:

all.repeated_permutation(10).to_a.map(&join).shuffle

and pop the resulting strings one by one, right? Wrong! The problem is that the amount of possibilities happens to be:

k^n = 10000000000000000000000000000000000000000000000000000000000000 (10**61).

Even if you had an infinetelly fast processor, you still can't hold such amount of data, no matter if this was the count of complex objects or simple bits.

The opposite would be to generate random permutations, keep the already generated in a set and make checks for inclusion before returning the next element. This is just delaying the innevitable - not only you would still have to hold the same amount of information at some point, but as the number of generated permutations grows, the number of tries required to generate a new permutation diverges to infinity.

As you might have thought, the root of the problem is that randomness and uniqueness hardly go hand to hand.


Firstly, we would have to define what we would consider as random. Judging by the amount of nerdy comics on the subject, you could deduce that this isn't that black and white either.

An intuitive definition for a random program would be one that doesn't generate the tokens in the same order with each execution. Great, so now we can just take the first n permutations (where n = rand(100)), put them at the end and enumerate everything in order? You can sense where this is going. In order for a random generation to be considered good, the generated outputs of consecutive runs should be equaly distributed. In simpler terms, the probability of getting any possible output should be equal to 1 / #__all_possible_outputs__.


Now lets explore the boundaries of our problem a little:

The number of possible k-permutations of n elements without repetition is:

n!/(n-k)! = 327_234_915_316_108_800 ((61 - 10 + 1).upto(61).reduce(:*))

Still out of reach. Same goes for

The number of possible full permutations of n elements without repetition:

n! = 507_580_213_877_224_798_800_856_812_176_625_227_226_004_528_988_036_003_099_405_939_480_985_600_000_000_000_000 (1.upto(61).reduce(:*))

The number of possible k-combinations of n elements without repetition:

n!/k!(n-k)! = 90_177_170_226 ((61 - 10 + 1).upto(61).reduce(:*)/1.upto(10).reduce(:*))

Finally, where we might have a break through with full permutation of k elements without repetition:

k! = 3_628_800 (1.upto(10).reduce(:*))

~3.5m isn't nothing, but at least it's reasonably computable. On my personal laptop k_permutations = 0.upto(9).to_a.permutation.to_a took 2.008337 seconds on average. Generally, as computing time goes, this is a lot. However, assuming that you will be running this on an actual server and only once per application startup, this is nothing. In fact, it would even be reasonable to create some seeds. A single k_permutations.shuffle took 0.154134 seconds, therefore in about a minute we can acquire 61 random permutations: k_seeds = 61.times.map { k_permutations.shuffle }.to_a.


Now lets try to convert the problem of k-permutations of n elements without repetition to solving multiple times full k-permutations without repetitions.

A cool trick for generating permutations is using numbers and bitmaps. The idea is to generate all numbers from 0 to 2^61 - 1 and look at the bits. If there is a 1 on position i, we will use the all[i] element, otherwise we will skip it. We still didn't escape the issue as 2^61 = 2305843009213693952 (2**61) which we can't keep in memory.

Fortunatelly, another cool trick comes to the rescue, this time from number theory.

Any m consecutive numbers raised to the power of a prime number by modulo of m give the numbers from 0 to m - 1

In other words:

5.upto(65).map { |number| number**17 % 61 }.sort # => [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60]
5.upto(65).map { |number| number**17 % 61 } # => [36, 31, 51, 28, 20, 59, 11, 22, 47, 48, 42, 12, 54, 26, 5, 34, 29, 57, 24, 53, 15, 55, 3, 38, 21, 18, 43, 40, 23, 58, 6, 46, 8, 37, 4, 32, 27, 56, 35, 7, 49, 19, 13, 14, 39, 50, 2, 41, 33, 10, 30, 25, 16, 9, 17, 60, 0, 1, 44, 52, 45]

Now actually, how random is that? As it turns out - the more common divisors shared by m and the selected m numbers, the less evenly distributed the sequence is. But we are at luck here - 61^2 - 1 is a prime number (also called Mersenne prime). Therefore, the only divisors it can share are 1 and 61^2 - 1. This means that no matter what power we choose, the positions of the numbers 0 and 1 will be fixed. That is not perfect, but the other 61^2 - 3 numbers can be found at any position. And guess what - we don't care about 0 and 1 anyway, because they don't have 10 1s in their binary representation!

Unfortunatelly, a bottleneck for our randomness is the fact that the bigger prime number we want to generate, the harder it gets. This is the best I can come up with when it comes to generating all the numbers in a range in a shuffled order, without keeping them in memory simultaneously.


So to put everything in use:

  1. We generate seeds of full permutations of 10 elements.
  2. We generate a random prime number.
  3. We randomly choose if we want to generate permutations for the next number in the sequence or a number that we already started (up to a finite number of started numbers).
  4. We use bitmaps of the generated numbers to get said permutations.

Note that this will solve only the problem of k-permutations of n elements without repetition. I still haven't thought of a way to add repetition.


DISCLAIMER: The following code comes with no guarantees of any kind, explicit or implied. Its point is to further express the author's ideas, rather than be a production ready solution:

require 'prime'

class TokenGenerator
  NUMBERS_UPPER_BOUND = 2**61 - 1
  HAS_NUMBER_MASK     = ('1' * 9 + '0' * (61 - 9)).reverse.to_i(2)
  HAS_LOWER_CASE_MASK = ('0' * 9 + '1' * 26 + '0' * 26).reverse.to_i(2)
  HAS_UPPER_CASE_MASK = ('0' * (9 + 26) + '1' * 26).reverse.to_i(2)
  ALL_CHARACTERS = [*'1'..'9', *'a'..'z', *'A'..'Z']
  K_PERMUTATIONS = 0.upto(9).to_a.permutation.to_a # give it a couple of seconds

  def initialize
    random_prime = Prime.take(10_000).drop(100).sample

    @all_numbers_generator = 1.upto(NUMBERS_UPPER_BOUND).lazy.map do |number|
      number**random_prime % NUMBERS_UPPER_BOUND
    end.select do |number|
      !(number & HAS_NUMBER_MASK).zero? and
      !(number & HAS_LOWER_CASE_MASK).zero? and
      !(number & HAS_UPPER_CASE_MASK).zero? and
      number.to_s(2).chars.count('1') == 10
    end

    @k_permutation_seeds  = 61.times.map { K_PERMUTATIONS.shuffle }.to_a # this will take a minute
    @numbers_in_iteration = {go_fish: nil}
  end

  def next
    raise StopIteration if @numbers_in_iteration.empty?
    number_generator = @numbers_in_iteration.keys.sample

    if number_generator == :go_fish
      add_next_number if @numbers_in_iteration.size < 1_000_000
      self.next
    else
      next_permutation(number_generator)
    end
  end

  private
  def add_next_number
    @numbers_in_iteration[@all_numbers_generator.next] = @k_permutation_seeds.sample.to_enum
  rescue StopIteration # lol, you actually managed to traverse all 2^61 numbers!
    @numbers_in_iteration.delete(:go_fish)
  end

  def next_permutation(number)
    fetch_permutation(number, @numbers_in_iteration[number].next)
  rescue StopIteration # all k permutations for this number were already generated
    @numbers_in_iteration.delete(number)
    self.next
  end

  def fetch_permutation(number_mask, k_permutation)
    k_from_n_indices = number_mask.to_s(2).chars.reverse.map.with_index { |bit, index| index if bit == '1' }.compact
    k_permutation.each_with_object([]) { |order_index, k_from_n_values| k_from_n_values << ALL_CHARACTERS[k_from_n_indices[order_index]] }
  end
end

EDIT: it turned out that our constraints eliminate too much possibilities. This causes @all_numbers_generator to take too much time testing and skipping numbers. I will try to think of a better generator, but everything else remains valid.


The old version that generates tokens with uniqueness constraint on the containing characters:

numbers          = ('0'..'9').to_a
downcase_letters = ('a'..'z').to_a
upcase_letters   = downcase_letters.map(&:upcase)
all = [numbers, downcase_letters, upcase_letters]
one_of_each_set  = all.map(&:sample)
random_code = (one_of_each_set + (all.flatten - one_of_each_set).sample(7)).shuffle.join
查看更多
再贱就再见
3楼-- · 2019-09-12 08:59
down   = ('a'..'z').to_a
up     = ('A'..'Z').to_a
digits = ('0'..'9').to_a
all    = down + up + digits
[down.sample, up.sample, digits.sample].
  concat(7.times.map { all.sample }).
  shuffle.
  join
  #=> "TioS8TYw0F"

[Edit: The above reflects a misunderstanding of the question. I'll leave it, however. To have no characters appear more than once:

def rnd_str
  down   = ('a'..'z').to_a
  up     = ('A'..'Z').to_a
  digits = ('0'..'9').to_a
  [extract1(down), extract1(up), extract1(digits)].
    concat(((down+up+digits).sample(7))).shuffle.join
end

def extract1(arr)
  i = arr.size.times.to_a.sample
  c = arr[i]
  arr.delete_at(i)
  c
end

rnd_str #=> "YTLe0WGoa1" 
rnd_str #=> "NrBmAnE9bT"

down.sample.shift (etc.) would have been more compact than extract1, but the inefficiency was just too much to bear.

If you do not want to repeat random strings, simply keep a list of the ones you generate. If you generate another that is in the list, discard it and generate another. It's pretty unlikely you'll have to generate any extra ones, however. If, for example, you generate 100 random strings (satisfying the requirement of at least one lowercase letter, uppercase letter and digit), the chances that there will be one or more duplicate strings is about one in 700,000:

t = 107_518_933_731
n = t+1
t = t.to_f
(1.0 - 100.times.reduce(1.0) { |prod,_| prod * (n -= 1)/t }).round(10)
  #=> 1.39e-07

where t = C(62,10) and C(62,10) is defined below.

An alternative

There is a really simple way to do this that turns out to be pretty efficient: just sample without replacement until a sample is found that meets the requirement of at least lowercase letter, one uppercase letter and one digit. We can do that as follows:

DOWN   = ('a'..'z').to_a
UP     = ('A'..'Z').to_a
DIGITS = ('0'..'9').to_a
ALL    = DOWN + UP + DIGITS

def rnd_str
  loop do
    arr = ALL.sample(10)
    break arr.shuffle.join unless (DOWN&&arr).empty? || (UP&&arr).empty? || 
    (DIGITS&&arr).empty?
  end
end

rnd_str #=> "3jRkHcP7Ge" 
rnd_str #=> "B0s81x4Jto

How many samples must we reject, on average, before finding a "good" one? It turns out (see below if you are really, really interested) that the probability of getting a "bad" string (i.e, selecting 10 characters at random from the 62 elements of all, without replacement, that has no lowercase letters, no uppercase letters or no digits, is only about 0.15. (15%). That means that 85% of the time no bad samples will be rejected before a good one is found.

It turns out that the expected number of bad strings that will be sampled, before a good string is sampled, is:

0.15/0.85 =~ 0.17

The following shows how the above probability was derived, should anyone be interested.

Let n_down be the number of ways a sample of 10 can be drawn that has no lowercase letters:

n_down = C(36,10) = 36!/(10!*(36-10)!)

where (the binomial coefficient) C(36,10) equals the number of combinations of 36 "things" that can be "taken" 10 at a time, and equals:

C(36,10) = 36!/(10!*(36-10)!) #=> 254_186_856

Similarly,

n_up = n_down #=> 254_186_856

and

n_digits = C(52,10) #=> 15_820_024_220

We can add these three numbers together to obtain:

n_down + n_up + n_digits #=> 16_328_397_932

This is almost, but not quite, the number of ways to draw 10 characters, without replacement, that contains no lowercase letters characters, uppercase letters or digits. "Not quite" because there is a bit of double-counting going on. The necessary adjustment is as follows:

n_down + n_up + n_digits - 2*C(26,10) - 3
  #=> 16_317_774_459

To obtain the probability of drawing a sample of 10 from a population of 62, without replacement, that has no lowercase letter, no uppercase letter or no digit, we divide this number by the total number of ways 10 characters can be drawn from 62 without replacement:

(16_317_774_459.0/c(62,10)).round(2)
  #=> 0.15
查看更多
时光不老,我们不散
4楼-- · 2019-09-12 09:06

Use 'SafeRandom' Gem GithubLink

It will provide the easiest way to generate random values for Rails 2, Rails 3, Rails 4, Rails 5 compatible.

Here you can use the strong_string method to generate a strong combination of string ( ie combination of the alphabet(uppercase, downcase), number, and symbols

# => Strong string: Minumum number should be greater than 5 otherwise by default 8 character string.
require 'safe_random'
puts SafeRandom.strong_string                   # => 4skgSy93zaCUZZCoF9WiJF4z3IDCGk%Y
puts SafeRandom.strong_string(3)                # => P4eUbcK%
puts SafeRandom.strong_string(5)                # => 5$Rkdo
查看更多
登录 后发表回答