Randomly sampling unique subsets of an array

2019-06-17 06:47发布

问题:

If I have an array:

a = [1,2,3]

How do I randomly select subsets of the array, such that the elements of each subset are unique? That is, for a the possible subsets would be:

[]
[1]
[2]
[3]
[1,2]
[2,3]
[1,2,3]

I can't generate all of the possible subsets as the real size of a is very big so there are many, many subsets. At the moment, I am using a 'random walk' idea - for each element of a, I 'flip a coin' and include it if the coin comes up heads - but I am not sure if this actually uniformly samples the space. It feels like it biases towards the middle, but this might just be my mind doing pattern-matching, as there will be more middle sized possiblities.

Am I using the right approach, or how should I be randomly sampling?

(I am aware that this is more of a language agnostic and 'mathsy' question, but I felt it wasn't really Mathoverflow material - I just need a practical answer.)

回答1:

Just go ahead with your original "coin flipping" idea. It uniformly samples the space of possibilities.

It feels to you like it's biased towards the "middle", but that's because the number of possibilities is largest in the "middle". Think about it: there is only 1 possibility with no elements, and only 1 with all elements. There are N possibilities with 1 element, and N possibilities with (N-1) elements. As the number of elements chosen gets closer to (N/2), the number of possibilities grows very quickly.



回答2:

You could generate random numbers, convert them to binary and choose the elements from your original array where the bits were 1. Here is an implementation of this as a monkey-patch for the Array class:

class Array
  def random_subset(n=1)
    raise ArgumentError, "negative argument" if n < 0
    (1..n).map do
      r = rand(2**self.size)
      self.select.with_index { |el, i| r[i] == 1 }
    end
  end
end

Usage:

a.random_subset(3) 
#=> [[3, 6, 9], [4, 5, 7, 8, 10], [1, 2, 3, 4, 6, 9]]

Generally this doesn't perform so bad, it's O(n*m) where n is the number of subsets you want and m is the length of the array.



回答3:

I think the coin flipping is fine.

ar = ('a'..'j').to_a
p ar.select{ rand(2) == 0 }

An array with 10 elements has 2**10 possible combinations (including [ ] and all 10 elements) which is nothing more then 10 times (1 or 0). It does output more arrays of four, five and six elements, because there are a lot more of those in the powerset.



回答4:

A way to select a random element from the power set is the following:

my_array = ('a'..'z').to_a
power_set_size = 2 ** my_array.length
random_subset = rand(power_set_size)
subset = []
random_subset.to_i(2).chars.each_with_index do |bit, corresponding_element|
  subset << my_array[corresponding_element] if bit == "1"
end

This makes use of strings functions instead than working with real "bits" and bitwise operations just for my convenience. You can turn it into a faster (I guess) algorithm by using real bits.

What it does, is to encode the powerset of array as an integer between 0 and 2 ** array.length and then picks one of those integers at random (uniformly random, indeed). Then it decodes back the integer into a particular subset of array using a bitmask (1 = the element is in the subset, 0 = it is not).

In this way you have an uniform distribution over the power set of your array.



回答5:

a.select {|element| rand(2) == 0 }

For each element, a coin is flipped. If heads ( == 0), then it is selected.