I am using NumPy to handle some large data matrices (of around ~50GB in size). The machine where I am running this code has 128GB of RAM so doing simple linear operations of this magnitude shouldn't be a problem memory-wise.
However, I am witnessing a huge memory growth (to more than 100GB) when computing the following code in Python:
import numpy as np
# memory allocations (everything works fine)
a = np.zeros((1192953, 192, 32), dtype='f8')
b = np.zeros((1192953, 192), dtype='f8')
c = np.zeros((192, 32), dtype='f8')
a[:] = b[:, :, np.newaxis] - c[np.newaxis, :, :] # memory explodes here
Please note that initial memory allocations are done without any problems. However, when I try to perform the subtract operation with broadcasting, the memory grows to more than 100GB. I always thought that broadcasting would avoid making extra memory allocations but now I am not sure if this is always the case.
As such, can someone give some details on why this memory growth is happening, and how the following code could be rewritten using more memory efficient constructs?
I am running the code in Python 2.7 within IPython Notebook.
@rth's suggestion to do the operation in smaller batches is a good one. You could also try using the function np.subtract
and give it the destination array to avoid creating an addtional temporary array. I also think you don't need to index c
as c[np.newaxis, :, :]
, because it is already a 3-d array.
So instead of
a[:] = b[:, :, np.newaxis] - c[np.newaxis, :, :] # memory explodes here
try
np.subtract(b[:, :, np.newaxis], c, a)
The third argument of np.subtract
is the destination array.
Well, your array a
takes already 1192953*192*32* 8 bytes/1.e9 = 58 GB
of memory.
The broadcasting does not make additional memory allocations for the initial arrays, but the result of
b[:, :, np.newaxis] - c[np.newaxis, :, :]
is still saved in a temporary array. Therefore at this line, you have allocated at least 2 arrays with the shape of a
for a total memory used >116 GB
.
You can avoid this issue, by operating on a smaller subset of your array at one time,
CHUNK_SIZE = 100000
for idx in range(b.shape[0]/CHUNK_SIZE):
sl = slice(idx*CHUNK_SIZE, (idx+1)*CHUNK_SIZE)
a[sl] = b[sl, :, np.newaxis] - c[np.newaxis, :, :]
this will be marginally slower, but uses much less memory.