Is it possible to programmatically construct a Pyt

2019-01-21 11:21发布

Is it possible to programmatically construct a stack (one or more stack frames) in CPython and start execution at an arbitrary code point? Imagine the following scenario:

  1. You have a workflow engine where workflows can be scripted in Python with some constructs (e.g. branching, waiting/joining) that are calls to the workflow engine.

  2. A blocking call, such as a wait or join sets up a listener condition in an event-dispatching engine with a persistent backing store of some sort.

  3. You have a workflow script, which calls the Wait condition in the engine, waiting for some condition that will be signalled later. This sets up the listener in the event dispatching engine.

  4. The workflow script's state, relevant stack frames including the program counter (or equivalent state) are persisted - as the wait condition could occur days or months later.

  5. In the interim, the workflow engine might be stopped and re-started, meaning that it must be possible to programmatically store and reconstruct the context of the workflow script.

  6. The event dispatching engine fires the event that the wait condition picks up.

  7. The workflow engine reads the serialised state and stack and reconstructs a thread with the stack. It then continues execution at the point where the wait service was called.

The Question

Can this be done with an unmodified Python interpreter? Even better, can anyone point me to some documentation that might cover this sort of thing or an example of code that programmatically constructs a stack frame and starts execution somewhere in the middle of a block of code?

Edit: To clarify 'unmodified python interpreter', I don't mind using the C API (is there enough information in a PyThreadState to do this?) but I don't want to go poking around the internals of the Python interpreter and having to build a modified one.

Update: From some initial investigation, one can get the execution context with PyThreadState_Get(). This returns the thread state in a PyThreadState (defined in pystate.h), which has a reference to the stack frame in frame. A stack frame is held in a struct typedef'd to PyFrameObject, which is defined in frameobject.h. PyFrameObject has a field f_lasti (props to bobince) which has a program counter expressed as an offset from the beginning of the code block.

This last is sort of good news, because it means that as long as you preserve the actual compiled code block, you should be able to reconstruct locals for as many stack frames as necessary and re-start the code. I'd say this means that it is theoretically possible without having to make a modified python interpereter, although it means that the code is still probably going to be fiddly and tightly coupled to specific versions of the interpreter.

The three remaining problems are:

  • Transaction state and 'saga' rollback, which can probably be accomplished by the sort of metaclass hacking one would use to build an O/R mapper. I did build a prototype once, so I have a fair idea of how this might be accomplished.

  • Robustly serialising transaction state and arbitrary locals. This might be accomplished by reading __locals__ (which is available from the stack frame) and programatically constructing a call to pickle. However, I don't know what, if any, gotchas there might be here.

  • Versioning and upgrade of workflows. This is somewhat trickier, as the system is not providing any symbolic anchors for workflow nodes. All we have is the anchor In order to do this, one would have to identify the offsets of all of the entry points and map them to the new version. Probably feasible to do manually, but I suspect it would be hard to automate. This is probably the biggest obstacle if you want to support this capability.

Update 2: PyCodeObject (code.h) has a list of addr (f_lasti)-> line number mappings in PyCodeObject.co_lnotab (correct me if wrong here). This might be used to facilitate a migration process to update workflows to a new version, as frozen instruction pointers could be mapped to the appropriate place in the new script, done in terms of the line numbers. Still quite messy but a little more promising.

Update 3: I think the answer to this might be Stackless Python. You can suspend tasks and serialise them. I haven't worked out whether this will also work with the stack as well.

7条回答
beautiful°
2楼-- · 2019-01-21 11:48

You could grab the existing stack frame by throwing an exception and stepping back one frame along in the traceback. The problem is there is no way provided to resume execution in the middle (frame.f_lasti) of the code block.

“Resumable exceptions” are a really interesting language idea, although it's tricky to think of a reasonable way they could interact with Python's existing ‘try/finally’ and ‘with’ blocks.

For the moment, the normal way of doing this is simply to use threads to run your workflow in a separate context to its controller. (Or coroutines/greenlets if you don't mind compiling them in).

查看更多
forever°为你锁心
3楼-- · 2019-01-21 11:49

How about using joblib?

I'm not quite sure this is what you want but it seems to fit the idea of having a workflow of which stages can be persisted. Joblib's use case seems to be to avoid recomputation, I'm not sure if this is what you are trying to do here or something more complicated?

查看更多
冷血范
4楼-- · 2019-01-21 11:53

The expat python bindings included in the normal Python distribution is constructing stack frames programtically. Be warned though, it relies on undocumented and private APIs.

http://svn.python.org/view/python/trunk/Modules/pyexpat.c?rev=64048&view=auto

查看更多
男人必须洒脱
5楼-- · 2019-01-21 11:57

I have the same type of problem to solve. I wonder what the original poster decided to do.

stackless claims it can pickle tasklets as long as there's no associated 'encumbered' C stack (encumbered is my choice of phrasing).

I'll probably use eventlet and figure out some way of pickling 'state', I really don't want to write an explicit state machine though..

查看更多
混吃等死
6楼-- · 2019-01-21 11:58

With standard CPython this is complicated by the mixture of C and Python data in the stack. Rebuilding the call stack would require the C stack to be reconstructed at the same time. This really puts it in the too hard basket as it could potentially tightly couple the implementation to specific versions of CPython.

Stackless Python allows tasklets to be pickled, which gives most of the capability required out of the box.

查看更多
看我几分像从前
7楼-- · 2019-01-21 12:00

What you generally want are continuations, which I see is already a tag on this question.

If you have the ability to work with all of the code in the system, you may want to try doing it this way rather than dealing with the interpreter stack internals. I'm not sure how easily this will be persisted.

http://www.ps.uni-sb.de/~duchier/python/continuations.html

In practice, I would structure your workflow engine so that your script submits action objects to a manager. The manager could pickle the set of actions at any point and allow them to be loaded and begin execution again (by resuming the submission of actions).

In other words: make your own, application-level, stack.

查看更多
登录 后发表回答