How to work with interactively-defined classes in

2019-04-27 12:04发布

问题:

Context

In an interactive prototyping development on the notebook connected to a cluster, I would like to define a class that is both available in the client __main__ session and interactively update on the cluster engine nodes to be able to move instances of that class around by passing such instances a argument to a LoadBalanced view. The following demonstrates the typical user session:

First setup the parallel clustering environment:

>>> from IPython.parallel import Client
>>> rc = Client()
>>> lview = rc.load_balanced_view()
>>> rc[:]
<DirectView [0, 1, 2]>

In a notebook cell let's define the code snippet of the component we are interactively editing:

>>> class MyClass(object):
...     def __init__(self, parameter):
...         self.parameter = parameter
...
...     def update_something(self, some_data):
...         # do something smart here with some_data & internal state
...
...     def compute_something(self, other_data):
...         # do something smart here with other data & internal state
...         return something
...

In the next cell, let's create a script that builds instances of this class and then use the load balanced view of the cluster environment to evaluate our component on a wide range of input parameters:

>>> def process(obj, some_data, other_data):
...     obj.update_something(some_data)
...     return obj.compute_something(other_data)
...
>>> tasks = []
>>> some_instances = [MyClass(i) for i in range(10)]
>>> for obj in some_instances:
...    for some_data in data_source_1:
...         for other_data in data_source_2:
...             ar = lview.apply_async(process, obj, some_data, other_data)
...             tasks.append(ar)
...
>>> # wait for computation to end
>>> results = [ar.get() for ar in tasks] 

Problem

That will obviously not work as the engines of the load balanced view will be unable to unpickle the instances passed as first argument to the process function. The process function definition itself is passed successfully as I assume that apply_async does bytecode instrospection to pickle it (by accessing the .code attribute of the function) and then just does a simple pickle for the remaining arguments.

Possible solutions (that don't work for me)

  • One alternative solution would be to use the %%px cell magic on the cell holding the definition of the class MyClass. However that would prevent me to build the class instances in the client script that also do the scheduling. I would need to copy and paste the cell content in an other cell without the %%px magic (or execute the cell twice once with magic and another time without the magic) but this is tedious when I am still editing the methods of the class in an iterative development & evaluation setting.

  • An alternative solution would be to embed the class definition inside the process function but I find this not practical as I would like to reuse that class definition in other functions later in my notebook.

  • Alternatively I could just stop using a class and only work with functions that can be shipped over to the engines by passing then as first argument to the apply_async. However I don't like that either as I would like to prototype my code in an object oriented way for later extraction from the notebook and including the resulting class in an object oriented library. The notebook session serving as a collaborative prototyping tool using for exchanging ideas between developers using the http://nbviewer.ipython.org publisher.

  • The final alternative would be to write my class in a python module on a file on the filesystem and ship that file to the engines PYTHONPATH using NFS for instance. That works but prevent me to work only in the notebook environment which defeats the whole purpose of interactive prototyping in the notebook.

So basically, is there a way to define a class interactively and then ship its definition around to the engines?

It should be possible to pickle a class definition using the inspect.getsource in the client then send the source to the engines and use the eval builtin but unfortunately source inspection does not work for classes defined inside the DummyMod built-in module:

TypeError: <IPython.core.interactiveshell.DummyMod object at 0x10c2c4e50> is a built-in class

Is there a way to inspect the bytecode of a class definition instead?

Or is it possible to use the %%px magic so as to both execute the content of the cell locally on the client and on each engine?

回答1:

Thanks for the detailed question (and pinging me on Twitter).

First, maybe it should be considered a bug that you can't just push classes, because the simple solution should be

rc[:]['MyClass'] = MyClass

but pickling interactively defined classes results only in a reference ('\x80\x02c__main__\nMyClass\nq\x01.'), giving your DummyMod AttributeError. This can probably be fixed internally in IPython's serialization.

On to an actual working solution, though.

Adding local execution to %%px is super easy, just:

def pxlocal(line, cell):
    ip = get_ipython()
    ip.run_cell_magic("px", line, cell)
    ip.run_cell(cell)
get_ipython().register_magic_function(pxlocal, "cell")

And now you have a %%pxlocal magic that runs %%px in addition to running the cell locally.

Then all you have to do is:

%%pxlocal

class MyClass(object):
    # etc

to define your class everywhere. I will add a --local flag to %%px, so this extra step isn't necessary.

A complete, working example notebook.



回答2:

I think you could use "dill" to pickle the interactively defined class, and not have to worry about %%pxlocal magic, using DummyMod, and faking of namespaces.

To pickle a class interactively, just do "import dill" and then build your class as you first did. You should be able to then send it across any sane map or apply_async function.