NumPy's load()
function returns a lazy file loader, not the actual data, when loading a npz
file. How to load a npz
file so that the data get loaded in memory?
问题:
回答1:
If you want to force the contents of the arrays to be read and decompressed, just assign their contents to variables, e.g.:
data = np.load('/path/to/data.npz', 'r')
a = data['a']
b = data['b']
# etc
If you wanted to keep the exact same syntax as with the lazy loader, you could simply load all of the arrays into a dict, e.g.:
data_dict = dict(data)
So now you could use
data_dict['a']
to refer to a
in later parts of your script. Personally I wouldn't keep the dict around, though, since the fact that it holds references to all of the arrays would prevent any individual unused ones from being garbage collected later on in your script.
回答2:
I think you answered your question in the previous one about speed:
data = np.load(dataset_text_filepath)['texts']
The file contents are now in memory.
The .npz
file is a zip archive, with multiple arrays. The reason for making load
a 2 step operation is that you might always want to load all arrays at once. It lets you load x
without loading y
.
You could use a system zip archive tool to extract one or more of the files, and then load that directly. That can be a useful step just to better understand the file structure.
To be any more direct you need to study np.lib.npyio.NpzFile
and maybe the gzip
module.