Web Interface

Dask Bokeh Dashboard

Information about the current state of the network helps to track progress, identify performance issues, and debug failures.

Dask.distributed includes a web interface to help deliver this information over a normal web page in real time. This web interface is launched by default wherever the scheduler is launched if the scheduler machine has Bokeh installed (conda install bokeh -c bokeh).

List of Servers

There are a few sets of diagnostic pages served at different ports:

  • Main Scheduler pages at http://scheduler-address:8787. These pages, particularly the /status page are the main page that most people associate with Dask. These pages are served from a separate standalone Bokeh server application running in a separate process.
  • Debug Scheduler pages at http://scheduler-address:8788. These pages have more detailed diagnostic information about the scheduler. They are more often used by developers than by users, but may still be of interest to the performance-conscious. These pages run from inside the scheduler process, and so compete for resources with the main scheduler.
  • Debug Worker pages for each worker at http://worker-address:8789. These pages have detailed diagnostic information about the worker. Like the diagnostic scheduler pages they are of more utility to developers or to people looking to understand the performance of their underlying cluster. If port 8789 is unavailable (for example it is in use by another worker) then a random port is chosen. A list of all ports can be obtained from looking at the service ports for each worker in the result of calling client.scheduler_info()

The rest of this document will be about the main pages at http://scheduler-address:8787.

The available pages are http://scheduler-address:8787/<page>/ where <page> is one of

  • status: a stream of recently run tasks, progress bars, resource use
  • tasks: a larger stream of the last 100k tasks
  • workers: basic information about workers and their current load

Plots

Example Computation

The following plots show a trace of the following computation:

from distributed import Client
from time import sleep
import random

def inc(x):
    sleep(random.random() / 10)
    return x + 1

def dec(x):
    sleep(random.random() / 10)
    return x - 1

def add(x, y):
    sleep(random.random() / 10)
    return x + y


client = Client('127.0.0.1:8786')

incs = client.map(inc, range(100))
decs = client.map(dec, range(100))
adds = client.map(add, incs, decs)
total = client.submit(sum, adds)

del incs, decs, adds
total.result()

Progress

The interface shows the progress of the various computations as well as the exact number completed.

Resources view of Dask web interface

Each bar is assigned a color according to the function being run. Each bar has a few components. On the left the lighter shade is the number of tasks that have both completed and have been released from memory. The darker shade to the right corresponds to the tasks that are completed and whose data still reside in memory. If errors occur then they appear as a black colored block to the right.

Typical computations may involve dozens of kinds of functions. We handle this visually with the following approaches:

  1. Functions are ordered by the number of total tasks
  2. The colors are assigned in a round-robin fashion from a standard palette
  3. The progress bars shrink horizontally to make space for more functions
  4. Only the largest functions (in terms of number of tasks) are displayed
Progress bar plot of Dask web interface

Counts of tasks processing, waiting for dependencies, processing, etc.. are displayed in the title bar.

Memory Use

The interface shows the relative memory use of each function with a horizontal bar sorted by function name.

Memory use plot of Dask web interface

The title shows the number of total bytes in use. Hovering over any bar tells you the specific function and how many bytes its results are actively taking up in memory. This does not count data that has been released.

Task Stream

The task stream plot shows when tasks complete on which workers. Worker cores are on the y-axis and time is on the x-axis. As a worker completes a task its start and end times are recorded and a rectangle is added to this plot accordingly.

Task stream plot of Dask web interface

If data transfer occurs between workers a red bar appears preceding the task bar showing the duration of the transfer. If an error occurs than a black bar replaces the normal color. This plot show the last 1000 tasks. It resets if there is a delay greater than 10 seconds.

For a full history of the last 100,000 tasks see the tasks/ page.

Resources

The resources plot show the average CPU and Memory use over time as well as average network traffic. More detailed information on a per-worker basis is available in the workers/ page.

Resources view of Dask web interface

Connecting to Web Interface

Default

By default, dask-scheduler prints out the address of the web interface:

INFO -  Bokeh UI at:  http://10.129.39.91:8787/status
...
INFO - Starting Bokeh server on port 8787 with applications at paths ['/status', '/tasks']

The machine hosting the scheduler runs an HTTP server serving at that address.

Troubleshooting

Some clusters restrict the ports that are visible to the outside world. These ports may include the default port for the web interface, 8787. There are a few ways to handle this:

  1. Open port 8787 to the outside world. Often this involves asking your cluster administrator.
  2. Use a different port that is publicly accessible using the --bokeh-port PORT option on the dask-scheduler command.
  3. Use fancier techniques, like Port Forwarding

Running distributed on a remote machine can cause issues with viewing the web UI – this depends on the remote machines network configuration.

Port Forwarding

If you have SSH access then one way to gain access to a blocked port is through SSH port forwarding. A typical use case looks like the following:

local$ ssh -L 8000:localhost:8787 user@remote
remote$ dask-scheduler  # now, the web UI is visible at localhost:8000
remote$ # continue to set up dask if needed -- add workers, etc

It is then possible to go to localhost:8000 and see Dask Web UI. This same approach is not specific to dask.distributed, but can be used by any service that operates over a network, such as Jupyter notebooks. For example, if we chose to do this we could forward port 8888 (the default Jupyter port) to port 8001 with ssh -L 8001:localhost:8888 user@remote.