Is it possible to save a numpy array appending it to an already existing npy-file --- something like np.save(filename,arr,mode='a')
?
I have several functions that have to iterate over the rows of a large array. I cannot create the array at once because of memory constrains. To avoid to create the rows over and over again, I wanted to create each row once and save it to file appending it to the previous row in the file. Later I could load the npy-file in mmap_mode, accessing the slices when needed.
you can try something like reading the file then add new data
after 2 operation:
The build-in
.npy
file format is perfectly fine for working with small datasets, without relying on external modules other thennumpy
.However, when you start having large amounts of data, the use of a file format, such as HDF5, designed to handle such datasets, is to be preferred [1].
For instance, below is a solution to save
numpy
arrays in HDF5 with PyTables,Step 1: Create an extendable
EArray
storageStep 2: Append rows to an existing dataset (if needed)
Step 3: Read back a subset of the data
This is an expansion on Mohit Pandey's answer showing a full save / load example. It was tested using Python 3.6 and Numpy 1.11.3.
For appending data to an already existing file using numpy.save, we should use:
I have checked that it works in python 2.7 and numpy 1.10.4
I have adapted the code from here, which talks about savetxt method.
.npy
files contain header which has shape and dtype of the array in it. If you know what your resulting array looks like, you can write header yourself and then data in chunks. E.g., here is the code for concatenating 2d matrices:If you need a more general solution (edit header in place while appending) you'll have to resort to
fseek
tricks like in [1].Inspired by
[1]: https://mail.scipy.org/pipermail/numpy-discussion/2009-August/044570.html (doesn't work out of the box)
[2]: https://docs.scipy.org/doc/numpy/neps/npy-format.html
[3]: https://github.com/numpy/numpy/blob/master/numpy/lib/format.py