How to load a npz file in a non-lazy way?

2020-07-30 04:31发布

NumPy's load() function returns a lazy file loader, not the actual data, when loading a npz file. How to load a npz file so that the data get loaded in memory?

2条回答
Fickle 薄情
2楼-- · 2020-07-30 04:44

I think you answered your question in the previous one about speed:

data = np.load(dataset_text_filepath)['texts']

The file contents are now in memory.

The .npz file is a zip archive, with multiple arrays. The reason for making load a 2 step operation is that you might always want to load all arrays at once. It lets you load x without loading y.

You could use a system zip archive tool to extract one or more of the files, and then load that directly. That can be a useful step just to better understand the file structure.

To be any more direct you need to study np.lib.npyio.NpzFile and maybe the gzip module.

查看更多
劳资没心,怎么记你
3楼-- · 2020-07-30 04:57

If you want to force the contents of the arrays to be read and decompressed, just assign their contents to variables, e.g.:

data = np.load('/path/to/data.npz', 'r')
a = data['a']
b = data['b']
# etc

If you wanted to keep the exact same syntax as with the lazy loader, you could simply load all of the arrays into a dict, e.g.:

data_dict = dict(data)

So now you could use

data_dict['a']

to refer to a in later parts of your script. Personally I wouldn't keep the dict around, though, since the fact that it holds references to all of the arrays would prevent any individual unused ones from being garbage collected later on in your script.

查看更多
登录 后发表回答