Generate large number of random card decks - NumPy

2019-01-29 05:13发布

问题:

I need to generate a large number of random poker card decks. Speed is important so everything has to be in numpy matrix form.

I understand I can generate two cards from a deck as follows:

np.random.choice(12*4,2, replace=False)

How can I execute the same query so that a 2d array is created without a for loop? The difficulty is that each round will need to distribute from the original stack, so replace is only true for rows but False for columns.

I've also tried it with

originalDeck=np.arange(1,12*4)
np.random.shuffle(originalDeck)

But here as well we would need to generate a 2d array of the originalDeck and then each line? Is this possible?

回答1:

You can simulate np.random.choice(..., replace=False)'s behavior with a trick here based on argsort/argpartition. The idea is simple : We create a random array and sort it. The sorted indices thus obtained being unique would resemble np.random.choice(..., replace=False).

Since, we are looking to have a 2D array with such a feature, start with a random 2D array and for performance use np.argpartition for getting the first two sorted indices along each row to simulate 2 cards picking.

Thus, we would have a vectorized approach like so -

# N : Number of queries
# M : Number of cards to be picked
out = np.argpartition(np.random.rand(N,12*4),M,axis=1)[:,:M]

Runtime test -

In [55]: # Input params
    ...: N = 1000000 # Number of queries
    ...: M = 2 # Number of cards to be picked
    ...: 
    ...: def original_app(N,M):
    ...:     out = np.empty((N,2),dtype=int)
    ...:     for i in range(N):
    ...:         out[i] = np.random.choice(12*4,M, replace=False)
    ...:     return out
    ...: 
    ...: def vectorized_app(N,M):
    ...:     return np.argpartition(np.random.rand(N,12*4),M,axis=1)[:,:M]
    ...: 

In [56]: %timeit original_app(N,M)
1 loops, best of 3: 12.7 s per loop

In [57]: %timeit vectorized_app(N,M)
1 loops, best of 3: 678 ms per loop


回答2:

Since you are only looking for pair of cards, you have only 1128 possible pairs (without replacement), so you could generate all pairs and then pick random cards from this set:

from itertools import combinations
# There may be a better way to generate all possible pairs in numpy,
# but I am not aware of and this is pretty fast for this size
all_pairs = np.array(list(combinations(range(12 * 4), 2)))
cards = all_pairs[np.random.randint(all_pairs.shape[0], size = N_PAIRS), :]

Where N_PAIRS is the number of pairs you want.

Benchmarks:

In [55]: # Input params
    ...: N = 1000000 # Number of queries
    ...: M = 2 # Number of cards to be picked
    ...: 
    ...: def original_app(N,M):
    ...:     out = np.empty((N,2),dtype=int)
    ...:     for i in range(N):
    ...:         out[i] = np.random.choice(12*4,M, replace=False)
    ...:     return out
    ...: 
    ...: def vectorized_app(N,M):
    ...:     return np.argpartition(np.random.rand(N,12*4),M,axis=1)[:,:M]
    ...: 
    ...: def itertools_app(N,M):
    ...:     all_pairs = np.array(list(combinations(range(12 * 4), M)))
    ...:     return all_pairs[np.random.randint(all_pairs.shape[0], size = N), :]

In [46]: %timeit original_app(N,M)
1 loops, best of 3: 10.8 s per loop

In [47]: %timeit vectorized_app(N,M)
1 loops, best of 3: 618 ms per loop

In [48]: %timeit itertools_app(N,M)
10 loops, best of 3: 24.8 ms per loop

This method is really fast when M is very small, as M gets bigger, the number of combinations increases exponentially, and thus even creating the all_pairs array is not possible (already with M = 5 you have ~1700000 possible combinations).



回答3:

A another simple approach, slightly slower than @Holt best solution.

def vectorized_app(N):
    u=np.random.randint(0,12*4,(2*N*103//100)).reshape(-1,2) # 3% more tries. 
    w=np.not_equal(*u.T) #selecting valid output, Two differents cards.
    return u[w][:N]