I have a huge (30GB) ndarray memory-mapped:
arr = numpy.memmap(afile, dtype=numpy.float32, mode="w+", shape=(n, n,))
After filling it in with some values (which goes very fine - max memory usage is below 1GB) I want to calculate standard deviation:
print('stdev: {0:4.4f}\n'.format(numpy.std(arr)))
This line fails miserably with MemoryError
.
I am not sure why this fails. I would be grateful for tips how to calculate these in a memory-efficient way?
Environment: venv + Python3.6.2 + NumPy 1.13.1
Indeed numpy's implementation of
std
andmean
make full copies of the array, and are horribly memory inefficient. Here is a better implementation: