Is it possible to programmatically construct a Pyt

2019-01-21 11:54发布

问题:

Is it possible to programmatically construct a stack (one or more stack frames) in CPython and start execution at an arbitrary code point? Imagine the following scenario:

  1. You have a workflow engine where workflows can be scripted in Python with some constructs (e.g. branching, waiting/joining) that are calls to the workflow engine.

  2. A blocking call, such as a wait or join sets up a listener condition in an event-dispatching engine with a persistent backing store of some sort.

  3. You have a workflow script, which calls the Wait condition in the engine, waiting for some condition that will be signalled later. This sets up the listener in the event dispatching engine.

  4. The workflow script's state, relevant stack frames including the program counter (or equivalent state) are persisted - as the wait condition could occur days or months later.

  5. In the interim, the workflow engine might be stopped and re-started, meaning that it must be possible to programmatically store and reconstruct the context of the workflow script.

  6. The event dispatching engine fires the event that the wait condition picks up.

  7. The workflow engine reads the serialised state and stack and reconstructs a thread with the stack. It then continues execution at the point where the wait service was called.

The Question

Can this be done with an unmodified Python interpreter? Even better, can anyone point me to some documentation that might cover this sort of thing or an example of code that programmatically constructs a stack frame and starts execution somewhere in the middle of a block of code?

Edit: To clarify 'unmodified python interpreter', I don't mind using the C API (is there enough information in a PyThreadState to do this?) but I don't want to go poking around the internals of the Python interpreter and having to build a modified one.

Update: From some initial investigation, one can get the execution context with PyThreadState_Get(). This returns the thread state in a PyThreadState (defined in pystate.h), which has a reference to the stack frame in frame. A stack frame is held in a struct typedef'd to PyFrameObject, which is defined in frameobject.h. PyFrameObject has a field f_lasti (props to bobince) which has a program counter expressed as an offset from the beginning of the code block.

This last is sort of good news, because it means that as long as you preserve the actual compiled code block, you should be able to reconstruct locals for as many stack frames as necessary and re-start the code. I'd say this means that it is theoretically possible without having to make a modified python interpereter, although it means that the code is still probably going to be fiddly and tightly coupled to specific versions of the interpreter.

The three remaining problems are:

  • Transaction state and 'saga' rollback, which can probably be accomplished by the sort of metaclass hacking one would use to build an O/R mapper. I did build a prototype once, so I have a fair idea of how this might be accomplished.

  • Robustly serialising transaction state and arbitrary locals. This might be accomplished by reading __locals__ (which is available from the stack frame) and programatically constructing a call to pickle. However, I don't know what, if any, gotchas there might be here.

  • Versioning and upgrade of workflows. This is somewhat trickier, as the system is not providing any symbolic anchors for workflow nodes. All we have is the anchor In order to do this, one would have to identify the offsets of all of the entry points and map them to the new version. Probably feasible to do manually, but I suspect it would be hard to automate. This is probably the biggest obstacle if you want to support this capability.

Update 2: PyCodeObject (code.h) has a list of addr (f_lasti)-> line number mappings in PyCodeObject.co_lnotab (correct me if wrong here). This might be used to facilitate a migration process to update workflows to a new version, as frozen instruction pointers could be mapped to the appropriate place in the new script, done in terms of the line numbers. Still quite messy but a little more promising.

Update 3: I think the answer to this might be Stackless Python. You can suspend tasks and serialise them. I haven't worked out whether this will also work with the stack as well.

回答1:

The expat python bindings included in the normal Python distribution is constructing stack frames programtically. Be warned though, it relies on undocumented and private APIs.

http://svn.python.org/view/python/trunk/Modules/pyexpat.c?rev=64048&view=auto



回答2:

What you generally want are continuations, which I see is already a tag on this question.

If you have the ability to work with all of the code in the system, you may want to try doing it this way rather than dealing with the interpreter stack internals. I'm not sure how easily this will be persisted.

http://www.ps.uni-sb.de/~duchier/python/continuations.html

In practice, I would structure your workflow engine so that your script submits action objects to a manager. The manager could pickle the set of actions at any point and allow them to be loaded and begin execution again (by resuming the submission of actions).

In other words: make your own, application-level, stack.



回答3:

Stackless python is probably the best… if you don't mind totally going over to a different python distribution. stackless can serialize everything in python, plus their tasklets. If you want to stay in the standard python distribution, then I'd use dill, which can serialize almost anything in python.

>>> import dill
>>> 
>>> def foo(a):
...   def bar(x):
...     return a*x
...   return bar
... 
>>> class baz(object):
...   def __call__(self, a,x):
...     return foo(a)(x)
... 
>>> b = baz()
>>> b(3,2)
6
>>> c = baz.__call__
>>> c(b,3,2)
6
>>> g = dill.loads(dill.dumps(globals()))
>>> g
{'dill': <module 'dill' from '/Library/Frameworks/Python.framework/Versions/7.2/lib/python2.7/site-packages/dill-0.2a.dev-py2.7.egg/dill/__init__.pyc'>, 'c': <unbound method baz.__call__>, 'b': <__main__.baz object at 0x4d61970>, 'g': {...}, '__builtins__': <module '__builtin__' (built-in)>, 'baz': <class '__main__.baz'>, '_version': '2', '__package__': None, '__name__': '__main__', 'foo': <function foo at 0x4d39d30>, '__doc__': None}

Dill registers it's types into the pickle registry, so if you have some black box code that uses pickle and you can't really edit it, then just importing dill can magically make it work without monkeypatching the 3rd party code.

Here's dill pickling the whole interpreter session...

>>> # continuing from above
>>> dill.dump_session('foobar.pkl')
>>>
>>> ^D
dude@sakurai>$ python
Python 2.7.5 (default, Sep 30 2013, 20:15:49) 
[GCC 4.2.1 (Apple Inc. build 5566)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session('foobar.pkl')
>>> c(b,3,2)
6

dill also has some good tools for helping you understand what is causing your pickling to fail when your code fails.

You also asked for where it's used to save interpreter state?

IPython can use dill to save the interpreter session to a file. https://nbtest.herokuapp.com/github/ipython/ipython/blob/master/examples/parallel/Using%20Dill.ipynb

klepto uses dill to support in-memory, to-disk, or to-database caching that avoids recomputation. https://github.com/uqfoundation/klepto/blob/master/tests/test_cache_info.py

mystic uses dill to save the checkpoints for large optimization jobs by saving the state of the optimizer as it's in progress. https://github.com/uqfoundation/mystic/blob/master/tests/test_solver_state.py

There are a couple other packages that use dill to save state of objects or sessions.



回答4:

You could grab the existing stack frame by throwing an exception and stepping back one frame along in the traceback. The problem is there is no way provided to resume execution in the middle (frame.f_lasti) of the code block.

“Resumable exceptions” are a really interesting language idea, although it's tricky to think of a reasonable way they could interact with Python's existing ‘try/finally’ and ‘with’ blocks.

For the moment, the normal way of doing this is simply to use threads to run your workflow in a separate context to its controller. (Or coroutines/greenlets if you don't mind compiling them in).



回答5:

With standard CPython this is complicated by the mixture of C and Python data in the stack. Rebuilding the call stack would require the C stack to be reconstructed at the same time. This really puts it in the too hard basket as it could potentially tightly couple the implementation to specific versions of CPython.

Stackless Python allows tasklets to be pickled, which gives most of the capability required out of the box.



回答6:

I have the same type of problem to solve. I wonder what the original poster decided to do.

stackless claims it can pickle tasklets as long as there's no associated 'encumbered' C stack (encumbered is my choice of phrasing).

I'll probably use eventlet and figure out some way of pickling 'state', I really don't want to write an explicit state machine though..



回答7:

How about using joblib?

I'm not quite sure this is what you want but it seems to fit the idea of having a workflow of which stages can be persisted. Joblib's use case seems to be to avoid recomputation, I'm not sure if this is what you are trying to do here or something more complicated?