Given the following NumPy array,
> a = array([[1, 2, 3, 4, 5], [1, 2, 3, 4, 5],[1, 2, 3, 4, 5]])
it's simple enough to shuffle a single row,
> shuffle(a[0])
> a
array([[4, 2, 1, 3, 5],[1, 2, 3, 4, 5],[1, 2, 3, 4, 5]])
Is it possible to use indexing notation to shuffle each of the rows independently? Or do you have to iterate over the array. I had in mind something like,
> numpy.shuffle(a[:])
> a
array([[4, 2, 3, 5, 1],[3, 1, 4, 5, 2],[4, 2, 1, 3, 5]]) # Not the real output
though this clearly doesn't work.
You have to call numpy.random.shuffle()
several times because you are shuffling several sequences independently. numpy.random.shuffle()
works on any mutable sequence and is not actually a ufunc
. The shortest and most efficient code to shuffle all rows of a two-dimensional array a
separately probably is
map(numpy.random.shuffle, a)
Vectorized solution with rand+argsort
trick
We could generate unique indices along the specified axis and index into the the input array with advanced-indexing
. To generate the unique indices, we would use random float generation + sort
trick, thus giving us a vectorized solution. We would also generalize it to cover generic n-dim
arrays and along generic axes
with np.take_along_axis
. The final implementation would look something like this -
def shuffle_along_axis(a, axis):
idx = np.random.rand(*a.shape).argsort(axis=axis)
return np.take_along_axis(a,idx,axis=axis)
Note that this shuffle won't be in-place and returns a shuffled copy.
Sample run -
In [33]: a
Out[33]:
array([[18, 95, 45, 33],
[40, 78, 31, 52],
[75, 49, 42, 94]])
In [34]: shuffle_along_axis(a, axis=0)
Out[34]:
array([[75, 78, 42, 94],
[40, 49, 45, 52],
[18, 95, 31, 33]])
In [35]: shuffle_along_axis(a, axis=1)
Out[35]:
array([[45, 18, 33, 95],
[31, 78, 52, 40],
[42, 75, 94, 49]])