I have around 700 matrices stored on disk, each with around 70k rows and 300 columns.
I have to load parts of these matrices relatively quickly, around 1k rows per matrix, into another matrix I have in memory. The fastest way I found to do this is using memory maps, where initially I am able to load the 1k rows in around 0.02 seconds. However, performance is not consistent at all and sometimes, loading takes up to 1 second per matrix!
My code looks like this roughly:
target = np.zeros((7000, 300))
target.fill(-1) # allocate memory
for path in os.listdir(folder_with_memmaps):
X = np.memmap(path, dtype=_DTYPE_MEMMAPS, mode='r', shape=(70000, 300))
indices_in_target = ... # some magic
indices_in_X = ... # some magic
target[indices_in_target, :] = X[indices_in_X, :]
With line by line timing I determined that it is definitely the last line that slows down over time.
Upadte: Plotting the load times gives different results. One time it looked like this, i.e. the degrade was not gradual but instead jumped after precisely 400 files. Could this be some OS limit?
But another time it looked completely different:
After a few more test runs, it seems that second plot is rather typical of the performance development.
Also, I tried to del X
after the loop, without any impact. Neither did accessing the underlying Python mmap
via X._mmap.close()
work.
Any ideas as to why there is inconsistent performance? Are there any faster alternatives to store & retrieve these matrices?
HDDs are poor at "serving more than one master" -- the slowdown can be much larger than one might expect. To demonstrate, I used this code to read the backup files (about 50 MB each) on the HDD of my Ubuntu 12.04 machine:
Running one of these "processes" gives me decent read performance:
Adding a second such process in parallel slows things down by more than an order of magnitude:
My guess is this (i.e., other HDD use) is the reason for your inconsistent performance. My hunch is an SSD would do significantly better. For my machine, for large files on the SSD the slowdown due to a parallel reader process was only twofold, from about 440 MB/s to about 220 MB/s. (See my comment.)
You might consider using bcolz . It compresses numerical data on disk and in memory to speed things up. You may have to transpose the matrices in order to get a sparse read since bcolz stores things by column rather than row.