I have a very a very large 2D numpy array that contains 2x2 subsets that I need to take the average of. I am looking for a way to vectorize this operation. For example, given x:
# |- col 0 -| |- col 1 -| |- col 2 -|
x = np.array( [[ 0.0, 1.0, 2.0, 3.0, 4.0, 5.0], # row 0
[ 6.0, 7.0, 8.0, 9.0, 10.0, 11.0], # row 0
[12.0, 13.0, 14.0, 15.0, 16.0, 17.0], # row 1
[18.0, 19.0, 20.0, 21.0, 22.0, 23.0]]) # row 1
I need to end up with a 2x3 array which are the averages of each 2x2 sub array, i.e.:
result = np.array( [[ 3.5, 5.5, 7.5],
[15.5, 17.5, 19.5]])
so element [0,0] is calculated as the average of x[0:2,0:2], while element [0,1] would be the average of x[2:4, 0:2]. Does numpy have vectorized/efficient ways of doing aggregates on subsets like this?
If we form the reshaped matrix
y = x.reshape(2,2,3,2)
, then the (i,j) 2x2 submatrix is given byy[i,:,j,:]
. E.g.:To get the mean of the 2x2 submatrices, use the
mean
method, withaxis=(1,3)
:If you are using an older version of numpy that doesn't support using a tuple for the axis, you could do:
See the link given by @dashesy in a comment for more background on the reshaping "trick".
To generalize this to a 2-d array with shape (m, n), where m and n are even, use
y
can then be interpreted as an array of 2x2 arrays. The first and third index slots of the 4-d array act as the indices that select one of the 2x2 blocks. To get the upper left 2x2 block, usey[0, :, 0, :]
; to the block in the second row and third column of blocks, usey[1, :, 2, :]
; and in general, to acces block (j, k), usey[j, :, k, :]
.To compute the reduced array of averages of these blocks, use the
mean
method, withaxis=(1, 3)
(i.e. average over axes 1 and 3):Here's an example where
x
has shape (8, 10), so the array of averages of the 2x2 blocks has shape (4, 5):Take a look at a couple of the 2x2 blocks:
Compute the averages of the blocks: