IPython loading variables to workspace: can you th

2019-02-19 23:23发布

问题:

I'm migrating from MATLAB to ipython and before taking the leap I'm going through my minimal workflow to make sure every operation I perform daily on MATLAB for data crunching is available on ipython.

I'm currently stuck on the very basic task of saving and loading numpy arrays via a one-line command, such as MATLAB's:

>>> save('myresults.mat','a','b','c')
>>> load('myresults.mat')

In particular, what I like about MATLAB's load command is that not only it reads the data file but it loads the variables into the workspace, nothing else is needed to start working with them. Note that this is not the case with, for instance, numpy.load(), which requires another line to be able to assign the loaded values to the workspace variables. [ See: IPython: how to automagically load npz file and assign values to variables? ]

Based on the answers and comments to that question, I came up with this dirty-bad-engineering-ugly-coding-but-working solution. I know it's not pretty, and I would like to know if you can come up with the correct version of this [1].

I put this into iocustom.py:

def load(filename):
    ip = get_ipython()
    ip.ex("import numpy as np")
    ip.ex("locals().update(np.load('" + filename + "'))") 

so that I can run, from the ipython session:

import iocustom
load('myresults.npz')

and the variables are dumped to the workspace.

I find it hard to believe there's nothing built-in equivalent to this, and it's even harder to think that that 3-line function is the optimal solution. I would be very grateful if you could please suggest a more correct way of doing this.

Please keep in mind that:

  • I'm looking for a solution which would also work inside a script and a function.
  • I know there's "pickle" but I refuse to use more than one line of code for something as mundane as a simple 'save' and/or 'load' command.
  • I know there's "savemat" and "loadmat" available from scipy, but I would like to migrate completely, i.e., do not work with mat files but with numpy arrays.

Thanks in advance for all your help.

[1] BTW: how do people working with ipython save and load a set of numpy arrays easily? After hours of googling I cannot seem to find a simple and straightforward solution for this daily task.

回答1:

If I save this as load_on_run.py:

import argparse
import numpy as np
if __name__=='__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-l','--list', help='list variables', action='store_true')
    parser.add_argument('filename')
    __args = parser.parse_args()
    data = np.load(__args.filename)
    locals().update(data)
    del parser, data, argparse, np
    if __args.list:
        print([k for k in locals() if not k.startswith('__')])
    del __args

And then in ipython I can invoke it with %run:

In [384]: %run load_on_run testarrays.npz -l
['array2', 'array3', 'array4', 'array1']
In [385]: array3
Out[385]: array([-10,  -9,  -8,  -7,  -6,  -5,  -4,  -3,  -2,  -1])

It neatly loads the arrays from the file into the ipython workspace.

I'm taking advantage of the fact that magic %run runs a script, leaving all functions and variables defined by it in the main namespace. I haven't looked into how it does this.

The script just takes a few arguments, loads the file (so far only .npz), and uses the locals().update trick to put its variables into the local namespace. Then I clear out the unnecessary variables and modules, leaving only the newly loaded ones.

I could probably define an alias for %run load_on_run.

I can also imagine a script along these lines that lets you load variables with an import: from <script> import *.



回答2:

You could assign the values in the npz file to global variables:

import numpy as np

def spill(filename):
    f = np.load(filename)
    for key, val in f.iteritems():
        globals()[key] = val
    f.close()

This solution works in Python2 and Python3, and any flavor of interative shell, not just IPython. Using spill is fine for interactive use, but not for scripts since

  1. It gives the file the ability to rebind arbitrary names to arbitrary values. That can lead to surprising, hard to debug behavior, or even be a security risk.
  2. Dynamically created variable names are hard to program with. As the Zen of Python (import this) says, "Namespaces are one honking great idea -- let's do more of those!" For a script it is better to keep the values in the NpzFile, f, and access them by indexing, such as f['x'].