Configuration

As with any distributed computation system, taking full advantage of Dask distributed sometimes requires configuration. Some options can be passed as API parameters and/or command line options to the various Dask executables. However, some options can also be entered in the Dask configuration file.

User-wide configuration

Dask accepts some configuration options in a configuration file, which by default is a .dask/config.yaml file located in your home directory. The file path can be overriden using the DASK_CONFIG environment variable.

The file is written in the YAML format, which allows for a human-readable hierarchical key-value configuration. All keys in the configuration file are optional, though Dask will create a default configuration file for you on its first launch.

Here is a synopsis of the configuration file:

logging:
    distributed: info
    distributed.client: warning
    bokeh: critical

# Scheduler options
bandwidth: 100000000    # 100 MB/s estimated worker-worker bandwidth
allowed-failures: 3     # number of retries before a task is considered bad
pdb-on-err: False       # enter debug mode on scheduling error
transition-log-length: 100000

# Worker options
multiprocessing-method: forkserver

# Communication options
compression: auto
tcp-timeout: 30         # seconds delay before calling an unresponsive connection dead
default-scheme: tcp
require-encryption: False   # whether to require encryption on non-local comms
tls:
    ca-file: myca.pem
    scheduler:
        cert: mycert.pem
        key: mykey.pem
    worker:
        cert: mycert.pem
        key: mykey.pem
    client:
        cert: mycert.pem
        key: mykey.pem
    #ciphers:
        #ECDHE-ECDSA-AES128-GCM-SHA256

# Bokeh web dashboard
bokeh-export-tool: False

We will review some of those options hereafter.

Communication options

compression

This key configures the desired compression scheme when transferring data over the network. The default value, “auto”, applies heuristics to try and select the best compression scheme for each piece of data.

default-scheme

The communication scheme used by default. You can override the default (“tcp”) here, but it is recommended to use explicit URIs for the various endpoints instead (for example tls:// if you want to enable TLS communications).

require-encryption

Whether to require that all non-local communications be encrypted. If true, then Dask will refuse establishing any clear-text communications (for example over TCP without TLS), forcing you to use a secure transport such as TLS.

tcp-timeout

The default “timeout” on TCP sockets. If a remote endpoint is unresponsive (at the TCP layer, not at the distributed layer) for at least the specified number of seconds, the communication is considered closed. This helps detect endpoints that have been killed or have disconnected abruptly.

tls

This key configures TLS communications. Several sub-keys are recognized:

  • ca-file configures the CA certificate file used to authenticate and authorize all endpoints.
  • ciphers restricts allowed ciphers on TLS communications.

Each kind of endpoint has a dedicated endpoint sub-key: scheduler, worker and client. Each endpoint sub-key also supports several sub-keys:

  • cert configures the certificate file for the endpoint.
  • key configures the private key file for the endpoint.

Scheduler options

allowed-failures

The number of retries before a “suspicious” task is considered bad. A task is considered “suspicious” if the worker died while executing it.

bandwidth

The estimated network bandwidth, in bytes per second, from worker to worker. This value is used to estimate the time it takes to ship data from one node to another, and balance tasks and data accordingly.

Misc options

logging

This key configures the desired verbosity level for each logger. Note that the Python logging module uses a hierarchical logger tree, so for example configuration the logging level for the distributed logger will also affect its children such as distributed.scheduler, unless explicitly overriden.