How to load a npz file in a non-lazy way?

2020-07-30 04:03发布

问题:

NumPy's load() function returns a lazy file loader, not the actual data, when loading a npz file. How to load a npz file so that the data get loaded in memory?

回答1:

If you want to force the contents of the arrays to be read and decompressed, just assign their contents to variables, e.g.:

data = np.load('/path/to/data.npz', 'r')
a = data['a']
b = data['b']
# etc

If you wanted to keep the exact same syntax as with the lazy loader, you could simply load all of the arrays into a dict, e.g.:

data_dict = dict(data)

So now you could use

data_dict['a']

to refer to a in later parts of your script. Personally I wouldn't keep the dict around, though, since the fact that it holds references to all of the arrays would prevent any individual unused ones from being garbage collected later on in your script.



回答2:

I think you answered your question in the previous one about speed:

data = np.load(dataset_text_filepath)['texts']

The file contents are now in memory.

The .npz file is a zip archive, with multiple arrays. The reason for making load a 2 step operation is that you might always want to load all arrays at once. It lets you load x without loading y.

You could use a system zip archive tool to extract one or more of the files, and then load that directly. That can be a useful step just to better understand the file structure.

To be any more direct you need to study np.lib.npyio.NpzFile and maybe the gzip module.