Dask dashboard not starting when starting schedule

2019-09-02 11:59发布

问题:

I've set up a distributed system using dask. When I start the scheduler using the Python API, the dask scheduler doesn't mention starting the dashboard. As expected, I can not reach it on the address I would expect it to be.

Since bokeh is installed, I'd expect the dashboard to be started. When I start the scheduler using the command line however, the dashboard starts correctly. Why is it that starting the scheduler through the python api does not start the dashboard?

Relevant information:

  • python 3.6.7
  • dask 1.0.0
  • dask-glm 0.2.0
  • dask-ml 0.11.0
  • distributed 1.25.1
  • bokeh 1.0.3
  • tornado 5.1.1 (also tried with 4.5)

Output scheduler (via python api):

orval$ python3 myscheduler.py
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO -   Scheduler at:    tcp://10.33.14.65:8786

Code myscheduler.py:

from distributed import Scheduler
from tornado.ioloop import IOLoop
from threading import Thread
s = Scheduler()
s.start('tcp://:8786')   # Listen on TCP port 8786
loop = IOLoop.current()
loop.start()

Starting the scheduler through the command line:

distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO -   Scheduler at:    tcp://10.33.14.65:8786
distributed.scheduler - INFO -       bokeh at:                     :8787
distributed.scheduler - INFO - Local Directory:    /tmp/scheduler-pg2wz3cg
distributed.scheduler - INFO - -----------------------------------------------

回答1:

Firstly, even when starting the scheduler within a python process, you may wish to consider using LocalCluster:

cluster = dask.distributed.LocalCluster(processes=False, n_workers=0)

where you can reach the scheduler as cluster.scheduler, and cluster.scheduler.services includes "bokeh".

For instantiating directly as you are doing, you would need to specify the services= keyword to include the Bokeh dashboard plugin. The class to instantiate is distributed.bokeh.scheduler.BokehScheduler, something like

services={('bokeh', diagnostics_port): (BokehScheduler, {})}

Were you wanting to do something particular with the loop and thread you have created? Perhaps, in that case, you can be more specific about what you want to achieve.