Reading the stdout from slave nodes with ipcluster

2019-06-27 14:19发布

I've set up a cluster using

ipcluster start --n=8

then accessed it using

from IPython.parallel import Client
c=Client()
dview=c[:]
e=[i for i in c]

I'm running processes on the slave nodes (e[0]-e[7]) which take a lot of time and I'd like them to send progress reports to the master so I can keep an eye on how far through they are.

There are two ways I can think to do this but so far I haven't been able to implement either of them, despite hours of trawling through question pages.

Either I want the nodes to push some data back to the master without prompt. i.e. within the long process that is run on the nodes I implement a function which passes its progress to the master at regular intervals.

Or I could redirect the stdout of the nodes to the that of the master and then just keep track of the progress using print. This is what I've been working on so far. Each node has its own stdout so print doesn't do anything if run remotely. I've tried pushing sys.stdout to the nodes but this just closes it.

I can't believe I'm the only person who wants to do this so maybe I'm missing something very simple. How can I keep track of long processes happening remotely using ipython?

1条回答
疯言疯语
2楼-- · 2019-06-27 14:51

stdout is already captured, logged, and tracked, and arrives at Clients as it comes, before the result is complete.

IPython ships with an example script that monitors stdout/err of all engines, which can easily be tweaked to only monitor a subset of this information, etc.

In the Client itself, you can check the metadata dict for stdout/err (Client.metadata[msg_id].stdout) before results are done. Use Client.spin() to flush any incoming messages off of the zeromq sockets, to ensure this data is up-to-date.

If you want stdout to update frequently, make sure you call sys.stdout.flush() to guarantee that the stream is actually published at that point, rather than relying on implicit flushes, which may not happen until the work completes.

查看更多
登录 后发表回答