Dask Gateway

Dask Gateway provides a secure, multi-tenant server for managing Dask clusters. It allows users to launch and use Dask clusters in a shared, centrally managed cluster environment, without requiring users to have direct access to the underlying cluster backend (e.g. Kubernetes, Hadoop/YARN, HPC Job queues, etc…).

Dask Gateway is one of many options for deploying Dask clusters, see Deploying Dask in the Dask documentation for an overview of additional options.

Highlights

  • Centrally Managed: Administrators do the heavy lifting of configuring the Gateway, users simply connect to the Gateway to get a new cluster. Eases deployment, and allows enforcing consistent configuration across all users.

  • Secure by Default: Cluster communication is automatically encrypted with TLS. All operations are authenticated with a configurable protocol, allowing you to use what makes sense for your organization.

  • Flexible: The gateway is designed to support multiple backends, and runs equally well in the cloud as on-premise. Natively supports Kubernetes, Hadoop/YARN, and HPC Job Queueing systems.

  • Robust to Failure: The gateway can be restarted or experience failover without losing existing clusters. Allows for seamless upgrades and restarts without disrupting users.

Architecture Overview

Dask Gateway is divided into three separate components:

  • Multiple active Dask Clusters (potentially more than one per user)

  • A Proxy for proxying both the connection between the user’s client and their respective scheduler, and the Dask Web UI for each cluster

  • A central Gateway that manages authentication and cluster startup/shutdown

Dask-Gateway high-level architecture

The gateway is designed to be flexible and pluggable, and makes heavy use of traitlets (the same technology used by the Jupyter ecosystem). In particular, both the cluster backend and the authentication protocol are pluggable.

Cluster Backends

Authentication Methods