How can I randomly iterate through a large Range?

2019-03-11 13:06发布

I would like to randomly iterate through a range. Each value will be visited only once and all values will eventually be visited. For example:

class Array
    def shuffle
        ret = dup
        j = length
        i = 0
        while j > 1
            r = i + rand(j)
            ret[i], ret[r] = ret[r], ret[i]
            i += 1
            j -= 1
        end
        ret
    end
end

(0..9).to_a.shuffle.each{|x| f(x)}

where f(x) is some function that operates on each value. A Fisher-Yates shuffle is used to efficiently provide random ordering.

My problem is that shuffle needs to operate on an array, which is not cool because I am working with astronomically large numbers. Ruby will quickly consume a large amount of RAM trying to create a monstrous array. Imagine replacing (0..9) with (0..99**99). This is also why the following code will not work:

tried = {} # store previous attempts
bigint = 99**99
bigint.times {
    x = rand(bigint)
    redo if tried[x]
    tried[x] = true
    f(x) # some function
}

This code is very naive and quickly runs out of memory as tried obtains more entries.

What sort of algorithm can accomplish what I am trying to do?

[Edit1]: Why do I want to do this? I'm trying to exhaust the search space of a hash algorithm for a N-length input string looking for partial collisions. Each number I generate is equivalent to a unique input string, entropy and all. Basically, I'm "counting" using a custom alphabet.

[Edit2]: This means that f(x) in the above examples is a method that generates a hash and compares it to a constant, target hash for partial collisions. I do not need to store the value of x after I call f(x) so memory should remain constant over time.

[Edit3/4/5/6]: Further clarification/fixes.

[Solution]: The following code is based on @bta's solution. For the sake of conciseness, next_prime is not shown. It produces acceptable randomness and only visits each number once. See the actual post for more details.

N = size_of_range
Q = ( 2 * N / (1 + Math.sqrt(5)) ).to_i.next_prime
START = rand(N)

x = START
nil until f( x = (x + Q) % N ) == START # assuming f(x) returns x

10条回答
唯我独甜
2楼-- · 2019-03-11 13:20

you can randomly iterate an array with shuffle method

a = [1,2,3,4,5,6,7,8,9]
a.shuffle!
=> [5, 2, 8, 7, 3, 1, 6, 4, 9]
查看更多
干净又极端
3楼-- · 2019-03-11 13:20

You want what's called a "full cycle iterator"...

Here is psudocode for the simplest version which is perfect for most uses...

function fullCycleStep(sample_size, last_value, random_seed = 31337, prime_number = 32452843) {
if last_value = null then last_value = random_seed % sample_size
    return (last_value + prime_number) % sample_size
}

If you call this like so:

sample = 10
For i = 1 to sample
    last_value = fullCycleStep(sample, last_value)
    print last_value
next

It would generate random numbers, looping through all 10, never repeating If you change random_seed, which can be anything, or prime_number, which must be greater than, and not be evenly divisible by sample_size, you will get a new random order, but you will still never get a duplicate.

查看更多
戒情不戒烟
4楼-- · 2019-03-11 13:25

Break the range in to manageable batches as shown below:

def range_walker range, batch_size = 100
  size = (range.end - range.begin) + 1
  n = size/batch_size 
  n.times  do |i|
    x = i * batch_size + range.begin
    y = x + batch_size
    (x...y).sort_by{rand}.each{|z| p z}
  end
  d = (range.end - size%batch_size + 1)
  (d..range.end).sort_by{rand}.each{|z| p z }
end

You can further randomize solution by randomly choosing the batch for processing.

PS: This is a good problem for map-reduce. Each batch can be worked by independent nodes.

Reference:

Map-reduce in Ruby

查看更多
叛逆
5楼-- · 2019-03-11 13:26

Database systems and other large-scale systems do this by writing the intermediate results of recursive sorts to a temp database file. That way, they can sort massive numbers of records while only keeping limited numbers of records in memory at any one time. This tends to be complicated in practice.

查看更多
男人必须洒脱
6楼-- · 2019-03-11 13:30

I just remembered a similar problem from a class I took years ago; that is, iterating (relatively) randomly through a set (completely exhausting it) given extremely tight memory constraints. If I'm remembering this correctly, our solution algorithm was something like this:

  1. Define the range to be from 0 to some number N
  2. Generate a random starting point x[0] inside N
  3. Generate an iterator Q less than N
  4. Generate successive points x[n] by adding Q to the previous point and wrapping around if needed. That is, x[n+1] = (x[n] + Q) % N
  5. Repeat until you generate a new point equal to the starting point.

The trick is to find an iterator that will let you traverse the entire range without generating the same value twice. If I'm remembering correctly, any relatively prime N and Q will work (the closer the number to the bounds of the range the less 'random' the input). In that case, a prime number that is not a factor of N should work. You can also swap bytes/nibbles in the resulting number to change the pattern with which the generated points "jump around" in N.

This algorithm only requires the starting point (x[0]), the current point (x[n]), the iterator value (Q), and the range limit (N) to be stored.

Perhaps someone else remembers this algorithm and can verify if I'm remembering it correctly?

查看更多
贼婆χ
7楼-- · 2019-03-11 13:35

How "random" does your order have to be? If you don't need a specific input distribution, you could try a recursive scheme like this to minimize memory usage:

def gen_random_indices
  # Assume your input range is (0..(10**3))
  (0..3).sort_by{rand}.each do |a|
    (0..3).sort_by{rand}.each do |b|
      (0..3).sort_by{rand}.each do |c|
        yield "#{a}#{b}#{c}".to_i
      end
    end
  end
end

gen_random_indices do |idx|
  run_test_with_index(idx)
end

Essentially, you are constructing the index by randomly generating one digit at a time. In the worst-case scenario, this will require enough memory to store 10 * (number of digits). You will encounter every number in the range (0..(10**3)) exactly once, but the order is only pseudo-random. That is, if the first loop sets a=1, then you will encounter all three-digit numbers of the form 1xx before you see the hundreds digit change.

The other downside is the need to manually construct the function to a specified depth. In your (0..(99**99)) case, this would likely be a problem (although I suppose you could write a script to generate the code for you). I'm sure there's probably a way to re-write this in a state-ful, recursive manner, but I can't think of it off the top of my head (ideas, anyone?).

查看更多
登录 后发表回答