How to create a list of random integer vector whos

2019-01-18 06:35发布

Creating a random vector whose sum is X (e.g. X=1000) is fairly straight forward:

import random
def RunFloat():
    Scalar = 1000
    VectorSize = 30
    RandomVector = [random.random() for i in range(VectorSize)]
    RandomVectorSum = sum(RandomVector)
    RandomVector = [Scalar*i/RandomVectorSum for i in RandomVector]
    return RandomVector
RunFloat()

The code above create a vector whose values are floats and sum is 1000.

I'm having difficulty creating a simple function for creating a vector whose values are integers and sum is X (e.g. X=1000*30)

import random
def RunInt():
    LowerBound = 600
    UpperBound = 1200
    VectorSize = 30
    RandomVector = [random.randint(LowerBound,UpperBound) for i in range(VectorSize)]
    RandomVectorSum = 1000*30
    #Sanity check that our RandomVectorSum is sensible/feasible
    if LowerBound*VectorSize <= RandomVectorSum and RandomVectorSum <= UpperBound*VectorSum:
        if sum(RandomVector) == RandomVectorSum:
            return RandomVector
        else:
            RunInt()  

Does anyone have any suggestions to improve on this idea? My code might never finish or run into recursion depth problems.

Edit (July 9, 2012)

Thanks to Oliver, mgilson, and Dougal for their inputs. My solution is shown below.

  1. Oliver was very creative with the multinomial distribution idea
  2. Put simply, (1) is very likely to output certain solutions more so than others. Dougal demonstrated that the multinomial solution space distribution is not uniform or normal by a simple test/counter example of Law of Large Numbers. Dougal also suggested to use numpy's multinomial function which saves me a lot of trouble, pain, and headaches.
  3. To overcome (2)'s output issue, I use RunFloat() to give what appears (I haven't tested this so its just a superficial appearance) to be a more uniform distribution. How much of a difference does this make compared to (1)? I don't really know off-hand. It's good enough for my use though.
  4. Thanks again to mgilson for the alternative method that does not use numpy.

Here is the code that I have made for this edit:

Edit #2 (July 11,2012)

I realized that the normal distribution is not correctly implemented, I have since modified it to the following:

import random
def RandFloats(Size):
    Scalar = 1.0
    VectorSize = Size
    RandomVector = [random.random() for i in range(VectorSize)]
    RandomVectorSum = sum(RandomVector)
    RandomVector = [Scalar*i/RandomVectorSum for i in RandomVector]
    return RandomVector

from numpy.random import multinomial
import math
def RandIntVec(ListSize, ListSumValue, Distribution='Normal'):
    """
    Inputs:
    ListSize = the size of the list to return
    ListSumValue = The sum of list values
    Distribution = can be 'uniform' for uniform distribution, 'normal' for a normal distribution ~ N(0,1) with +/- 5 sigma  (default), or a list of size 'ListSize' or 'ListSize - 1' for an empirical (arbitrary) distribution. Probabilities of each of the p different outcomes. These should sum to 1 (however, the last element is always assumed to account for the remaining probability, as long as sum(pvals[:-1]) <= 1).  
    Output:
    A list of random integers of length 'ListSize' whose sum is 'ListSumValue'.
    """
    if type(Distribution) == list:
        DistributionSize = len(Distribution)
        if ListSize == DistributionSize or (ListSize-1) == DistributionSize:
            Values = multinomial(ListSumValue,Distribution,size=1)
            OutputValue = Values[0]
    elif Distribution.lower() == 'uniform': #I do not recommend this!!!! I see that it is not as random (at least on my computer) as I had hoped
        UniformDistro = [1/ListSize for i in range(ListSize)]
        Values = multinomial(ListSumValue,UniformDistro,size=1)
        OutputValue = Values[0]
    elif Distribution.lower() == 'normal':
        """
        Normal Distribution Construction....It's very flexible and hideous
        Assume a +-3 sigma range.  Warning, this may or may not be a suitable range for your implementation!
        If one wishes to explore a different range, then changes the LowSigma and HighSigma values
        """
        LowSigma    = -3#-3 sigma
        HighSigma   = 3#+3 sigma
        StepSize    = 1/(float(ListSize) - 1)
        ZValues     = [(LowSigma * (1-i*StepSize) +(i*StepSize)*HighSigma) for i in range(int(ListSize))]
        #Construction parameters for N(Mean,Variance) - Default is N(0,1)
        Mean        = 0
        Var         = 1
        #NormalDistro= [self.NormalDistributionFunction(Mean, Var, x) for x in ZValues]
        NormalDistro= list()
        for i in range(len(ZValues)):
            if i==0:
                ERFCVAL = 0.5 * math.erfc(-ZValues[i]/math.sqrt(2))
                NormalDistro.append(ERFCVAL)
            elif i ==  len(ZValues) - 1:
                ERFCVAL = NormalDistro[0]
                NormalDistro.append(ERFCVAL)
            else:
                ERFCVAL1 = 0.5 * math.erfc(-ZValues[i]/math.sqrt(2))
                ERFCVAL2 = 0.5 * math.erfc(-ZValues[i-1]/math.sqrt(2))
                ERFCVAL = ERFCVAL1 - ERFCVAL2
                NormalDistro.append(ERFCVAL)  
            #print "Normal Distribution sum = %f"%sum(NormalDistro)
            Values = multinomial(ListSumValue,NormalDistro,size=1)
            OutputValue = Values[0]
        else:
            raise ValueError ('Cannot create desired vector')
        return OutputValue
    else:
        raise ValueError ('Cannot create desired vector')
    return OutputValue
#Some Examples        
ListSize = 4
ListSumValue = 12
for i in range(100):
    print RandIntVec(ListSize, ListSumValue,Distribution=RandFloats(ListSize))

The code above can be found on github. It is part of a class I built for school. user1149913, also posted a nice explanation of the problem.

7条回答
来,给爷笑一个
2楼-- · 2019-01-18 07:06

Just to give you another approach, implement a partition_function(X) and randomly choose a number between 0 and the length of partition_function(1000) and there you have it. Now you just need to find an efficient way to calculate a partition function. These links might help:

http://code.activestate.com/recipes/218332-generator-for-integer-partitions/

http://oeis.org/A000041

EDIT: Here is a simple code:

import itertools
import random
all_partitions = {0:set([(0,)]),1:set([(1,)])}

def partition_merge(a,b):
    c = set()
    for t in itertools.product(a,b):
        c.add(tuple(sorted(list(t[0]+t[1]))))
    return c

def my_partition(n):
    if all_partitions.has_key(n):
        return all_partitions[n]
    a = set([(n,)])
    for i in xrange(1,n/2+1):
        a = partition_merge(my_partition(i),my_partition(n-i)).union(a)
    all_partitions[n] = a
    return a

if __name__ == '__main__':
    n = 30
    # if you have a few years to wait uncomment the next line
    # n = 1000
    a = my_partition(n)
    i = random.randint(0,len(a)-1)
    print(list(a)[i])
查看更多
登录 后发表回答