I saved a couple of numpy arrays with np.save(), and put together they're quite huge.
Is it possible to load them all as memory-mapped files, and then concatenate and slice through all of them without ever loading anythin into memory?
I saved a couple of numpy arrays with np.save(), and put together they're quite huge.
Is it possible to load them all as memory-mapped files, and then concatenate and slice through all of them without ever loading anythin into memory?
If u use
order='F'
,will leads another problem, which when u load the file next time it will be quit a mess even pass theorder='F
. So my solution is below, I have test a lot, it just work fine.Using
numpy.concatenate
apparently load the arrays into memory. To avoid this you can easily create a thridmemmap
array in a new file and read the values from the arrays you wish to concatenate. In a more efficient way, you can also append new arrays to an already existing file on disk.For any case you must choose the right order for the array (row-major or column-major).
The following examples illustrate how to concatenate along axis 0 and axis 1.
1) concatenate along
axis=0
You can define a third array reading the same file as the first array to be concatenated (here
a
) in moder+
(read and append), but with the shape of the final array you want to achieve after concatenation, like:Concatenating along
axis=0
does not require to passorder='C'
because this is already the default order.2) concatenate along
axis=1
The arrays saved on disk are actually flattened, so if you create
c
withmode=r+
andshape=(5000,4000)
without changing the array order, the1000
first elements from the second line ina
will go to the first in line inc
. But you can easily avoid this passingorder='F'
(column-major) tomemmap
:Here you have an updated file 'a.array' with the concatenation result. You may repeat this process to concatenate in pairs of two.
Related questions: