Install on a Kubernetes Cluster

Here we provide instructions for installing and configuring dask-gateway-server on a Kubernetes Cluster.

Architecture

When running on Kubernetes, Dask Gateway is composed of the following components:

  • Multiple active Dask Clusters (potentially more than one per user)

  • A Traefik Proxy for proxying both the connection between the user’s client and their respective scheduler, and the Dask Web UI for each cluster

  • A Gateway API Server that handles user API requests

  • A Gateway Controller for managing the kubernetes objects used by each cluster (e.g. pods, secrets, etc…).

Dask-Gateway high-level kubernetes architecture

Both the Traefik Proxy deployment and the Gateway API Server deployment can be scaled to multiple replicas, for increased availability and scalability.

Create a Kubernetes Cluster (optional)

If you don’t already have a cluster running, you’ll want to create one. There are plenty of guides online for how to do this. We recommend following the excellent documentation provided by zero-to-jupyterhub-k8s.

Install Helm (optional)

If you don’t already have Helm installed, you’ll need to install it locally. As with above, there are plenty of instructional materials online for doing this. We recommend following the guide provided by zero-to-jupyterhub-k8s.

Install Dask-Gateway

At this point you should have a Kubernetes cluster with Helm installed and configured. You are now ready to install Dask-Gateway on your cluster.

Add the Helm Chart Repository

To avoid downloading the chart locally from GitHub, you can use the Dask-Gateway Helm chart repository.

$ helm repo add dask https://helm.dask.org/
$ helm repo update

Configuration

The Helm chart provides access to configure most aspects of the dask-gateway-server. These are provided via a configuration YAML file (the name of this file doesn’t matter, we’ll use config.yaml).

The Helm chart exposes many configuration values, see the default values.yaml file for more information.

Install the Helm Chart

To install the Dask-Gateway Helm chart, run the following command:

RELEASE=dask-gateway
NAMESPACE=dask-gateway
VERSION=0.9.0

helm upgrade --install \
    --namespace $NAMESPACE \
    --version $VERSION \
    --values path/to/your/config.yaml \
    $RELEASE \
    dask/dask-gateway

where:

  • RELEASE is the Helm release name to use (we suggest dask-gateway, but any release name is fine).

  • NAMESPACE is the Kubernetes namespace to install the gateway into (we suggest dask-gateway, but any namespace is fine).

  • VERSION is the Helm chart version to use. To use the latest published version you can omit the --version flag entirely. See the Helm chart repository for an index of all available versions.

  • path/to/your/config.yaml is the path to your config.yaml file created above.

Running this command may take some time, as resources are created and images are downloaded. When everything is ready, running the following command will show the EXTERNAL-IP addresses for the LoadBalancer service (highlighted below).

$ kubectl get service --namespace dask-gateway
NAME                              TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)          AGE
api-<RELEASE>-dask-gateway        ClusterIP      10.51.245.233   <none>           8000/TCP         6m54s
traefik-<RELEASE>-dask-gateway    LoadBalancer   10.51.247.160   146.148.58.187   80:30304/TCP     6m54s

You can also check to make sure the daskcluster CRD has been installed successfully:

$ kubectl get daskcluster -o yaml
apiVersion: v1
items: []
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

At this point, you have a fully running dask-gateway-server.

Connecting to the gateway

To connect to the running dask-gateway-server, you’ll need the external IPs from the traefik-* services above. The Traefik service provides access to API requests, proxies out the Dask Dashboards, and proxies TCP traffic between Dask clients and schedulers. (You can also choose to have Traefik handle scheduler traffic over a separate port, see the Helm chart reference).

To connect, create a dask_gateway.Gateway object, specifying the both addresses (the second traefik-* port goes under proxy_address if using separate ports). Using the same values as above:

>>> from dask_gateway import Gateway
>>> gateway = Gateway(
...     "http://146.148.58.187",
... )

You should now be able to use the gateway client to make API calls. To verify this, call dask_gateway.Gateway.list_clusters(). This should return an empty list as you have no clusters running yet.

>>> gateway.list_clusters()
[]

Shutting everything down

When you’re done with the gateway, you’ll want to delete your deployment and clean everything up. You can do this with helm delete:

$ helm delete --purge $RELEASE

Additional configuration

Here we provide a few configuration snippets for common deployment scenarios. For all available configuration fields see the Helm chart reference.

Using a custom image

By default schedulers/workers started by dask-gateway will use the daskgateway/dask-gateway image. This is a basic image with only the minimal dependencies installed. To use a custom image, you can configure:

  • gateway.backend.image.name: the default image name

  • gateway.backend.image.tag: the default image tag

For an image to work with dask-gateway, it must have a compatible version of dask-gateway installed (we recommend always using the same version as deployed on the dask-gateway-server).

Additionally, we recommend using an init process in your images. This isn’t strictly required, but running without an init process may lead to odd worker behaviors. We recommend using tini, but any init process should be fine.

There are no other requirements for images, any image that meets the above should work fine. You may install any additional libraries or dependencies you require.

To develop your own image, you may either base it on a compatible version of daskgateway/dask-gateway, or use our example dockerfile as a reference and develop your own.

Using extraPodConfig/extraContainerConfig

The Kubernetes API is large, and not all configuration fields you may want to set on scheduler/worker pods are directly exposed by the Helm chart. To address this, we provide a few fields for forwarding configuration directly to the underlying kubernetes objects:

  • gateway.backend.scheduler.extraPodConfig

  • gateway.backend.scheduler.extraContainerConfig

  • gateway.backend.worker.extraPodConfig

  • gateway.backend.worker.extraContainerConfig

These allow configuring any unexposed fields on the pod/container for schedulers and workers respectively. Each takes a mapping of key-value pairs, which is deep-merged with any settings set by dask-gateway itself (with preference given to the extra*Config values). Note that keys should be camelCase (rather than snake_case) to match those in the kubernetes API.

For example, this can be useful for setting things like tolerations or node affinities on scheduler or worker pods. Here we configure a node anti-affinity for scheduler pods to avoid preemptible nodes:

gateway:
  backend:
    scheduler:
      extraPodConfig:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                  - key: cloud.google.com/gke-preemptible
                    operator: DoesNotExist

For information on allowed fields, see the Kubernetes documentation:

Using extraConfig

Not all configuration options have been exposed via the helm chart. To set unexposed options, you can use the gateway.extraConfig field. This takes either:

  • A single python code-block (as a string) to append to the end of the generated dask_gateway_config.py file.

  • A map of keys -> code-blocks (recommended). When applied in this form, code-blocks are appended in alphabetical order by key (the keys themselves are meaningless). This allows merging multiple values.yaml files together, as Helm can natively merge maps.

For example, here we use gateway.extraConfig to set c.Backend.cluster_options, exposing options for worker resources and image (see Exposing Cluster Options for more information).

gateway:
  extraConfig:
    # Note that the key name here doesn't matter. Values in the
    # `extraConfig` map are concatenated, sorted by key name.
    clusteroptions: |
        from dask_gateway_server.options import Options, Integer, Float, String

        def option_handler(options):
            return {
                "worker_cores": options.worker_cores,
                "worker_memory": "%fG" % options.worker_memory,
                "image": options.image,
            }

        c.Backend.cluster_options = Options(
            Integer("worker_cores", 2, min=1, max=4, label="Worker Cores"),
            Float("worker_memory", 4, min=1, max=8, label="Worker Memory (GiB)"),
            String("image", default="daskgateway/dask-gateway:latest", label="Image"),
            handler=option_handler,
        )

For information on all available configuration options, see the Configuration Reference (in particular, the KubeClusterConfig section).

Authenticating with JupyterHub

JupyterHub provides a multi-user interactive notebook environment. Through the zero-to-jupyterhub-k8s project, many companies and institutions have setup JuypterHub to run on Kubernetes. When deploying Dask-Gateway alongside JupyterHub, you can configure Dask-Gateway to use JupyterHub for authentication. To do this, we register dask-gateway as a JupyterHub Service.

First we need to generate an API Token - this is commonly done using openssl:

$ openssl rand -hex 32

Then add the following lines to your config.yaml file:

gateway:
  auth:
    type: jupyterhub
    jupyterhub:
      apiToken: "<API TOKEN>"

replacing <API TOKEN> with the output from above.

If you’re not deploying Dask-Gateway in the same cluster and namespace as JupyterHub, you’ll also need to specify JupyterHub’s API url. This is usually of the form https://<JUPYTERHUB-HOST>:<JUPYTERHUB-PORT>/hub/api. If JupyterHub and Dask-Gateway are on the same cluster and namespace you can omit this configuration key, the address will be inferred automatically.

gateway:
  auth:
    type: jupyterhub
    jupyterhub:
      apiToken: "<API TOKEN>"
      apiUrl: "<API URL>"

You’ll also need to add the following to the config.yaml file for your JupyterHub Helm Chart.

hub:
  services:
    dask-gateway:
      apiToken: "<API TOKEN>"

again, replacing <API TOKEN> with the output from above.

With this configuration, JupyterHub will be used to authenticate requests between users and the dask-gateway-server. Note that users will need to add auth="jupyterhub" when they create a Gateway dask_gateway.Gateway object.

>>> from dask_gateway import Gateway
>>> gateway = Gateway(
...     "http://146.148.58.187",
...     auth="jupyterhub",
... )

Helm chart reference

The full default values.yaml file for the dask-gateway Helm chart is included here for reference:

# gateway nested config relates to the api Pod and the dask-gateway-server
# running within it, the k8s Service exposing it, as well as the schedulers
# (gateway.backend.scheduler) and workers gateway.backend.worker) created by the
# controller when a DaskCluster k8s resource is registered.
gateway:
  # Number of instances of the gateway-server to run
  replicas: 1

  # Annotations to apply to the gateway-server pods.
  annotations: {}

  # Resource requests/limits for the gateway-server pod.
  resources: {}

  # Path prefix to serve dask-gateway api requests under
  # This prefix will be added to all routes the gateway manages
  # in the traefik proxy.
  prefix: /

  # The gateway server log level
  loglevel: INFO

  # The image to use for the gateway-server pod.
  image:
    name: daskgateway/dask-gateway-server
    tag: 0.9.0
    pullPolicy: IfNotPresent

  # Image pull secrets for gateway-server pod
  imagePullSecrets: []

  # Configuration for the gateway-server service
  service:
    annotations: {}

  auth:
    # The auth type to use. One of {simple, kerberos, jupyterhub, custom}.
    type: simple

    simple:
      # A shared password to use for all users.
      password: null

    kerberos:
      # Path to the HTTP keytab for this node.
      keytab: null

    jupyterhub:
      # A JupyterHub api token for dask-gateway to use. See
      # https://gateway.dask.org/install-kube.html#authenticating-with-jupyterhub.
      apiToken: null

      # JupyterHub's api url. Inferred from JupyterHub's service name if running
      # in the same namespace.
      apiUrl: null

    custom:
      # The full authenticator class name.
      class: null

      # Configuration fields to set on the authenticator class.
      config: {}

  livenessProbe:
    # Enables the livenessProbe. 
    enabled: true
    # Configures the livenessProbe. 
    initialDelaySeconds: 5
    timeoutSeconds: 2
    periodSeconds: 10
    failureThreshold: 6
  readinessProbe:
    # Enables the readinessProbe.
    enabled: true
    # Configures the readinessProbe.
    initialDelaySeconds: 5
    timeoutSeconds: 2
    periodSeconds: 10
    failureThreshold: 3

  # backend nested configuration relates to the scheduler and worker resources
  # created for DaskCluster k8s resources by the controller.
  backend:
    # The image to use for both schedulers and workers.
    image:
      name: daskgateway/dask-gateway
      tag: 0.9.0
      pullPolicy: IfNotPresent

    # The namespace to launch dask clusters in. If not specified, defaults to
    # the same namespace the gateway is running in.
    namespace: null

    # A mapping of environment variables to set for both schedulers and workers.
    environment: {}

    scheduler:
      # Any extra configuration for the scheduler pod. Sets
      # `c.KubeClusterConfig.scheduler_extra_pod_config`.
      extraPodConfig: {}

      # Any extra configuration for the scheduler container.
      # Sets `c.KubeClusterConfig.scheduler_extra_container_config`.
      extraContainerConfig: {}

      # Cores request/limit for the scheduler.
      cores:
        request: null
        limit: null

      # Memory request/limit for the scheduler.
      memory:
        request: null
        limit: null

    worker:
      # Any extra configuration for the worker pod. Sets
      # `c.KubeClusterConfig.worker_extra_pod_config`.
      extraPodConfig: {}

      # Any extra configuration for the worker container. Sets
      # `c.KubeClusterConfig.worker_extra_container_config`.
      extraContainerConfig: {}

      # Cores request/limit for each worker.
      cores:
        request: null
        limit: null

      # Memory request/limit for each worker.
      memory:
        request: null
        limit: null

      # Number of threads available for a worker
      threads: 1

  # Settings for nodeSelector, affinity, and tolerations for the gateway pods
  nodeSelector: {}
  affinity: {}
  tolerations: []

  # Any extra configuration code to append to the generated `dask_gateway_config.py`
  # file. Can be either a single code-block, or a map of key -> code-block
  # (code-blocks are run in alphabetical order by key, the key value itself is
  # meaningless). The map version is useful as it supports merging multiple
  # `values.yaml` files, but is unnecessary in other cases.
  extraConfig: {}



# controller nested config relates to the controller Pod and the
# dask-gateway-server running within it that makes things happen when changes to
# DaskCluster k8s resources are observed.
controller:
  # Whether the controller should be deployed. Disabling the controller allows
  # running it locally for development/debugging purposes.
  enabled: true

  # Any annotations to add to the controller pod
  annotations: {}

  # Resource requests/limits for the controller pod
  resources: {}

  # Image pull secrets for controller pod
  imagePullSecrets: []

  # The controller log level
  loglevel: INFO

  # Max time (in seconds) to keep around records of completed clusters.
  # Default is 24 hours.
  completedClusterMaxAge: 86400

  # Time (in seconds) between cleanup tasks removing records of completed
  # clusters. Default is 5 minutes.
  completedClusterCleanupPeriod: 600

  # Base delay (in seconds) for backoff when retrying after failures.
  backoffBaseDelay: 0.1

  # Max delay (in seconds) for backoff when retrying after failures.
  backoffMaxDelay: 300

  # Limit on the average number of k8s api calls per second.
  k8sApiRateLimit: 50

  # Limit on the maximum number of k8s api calls per second.
  k8sApiRateLimitBurst: 100

  # The image to use for the controller pod.
  image:
    name: daskgateway/dask-gateway-server
    tag: 0.9.0
    pullPolicy: IfNotPresent

  # Settings for nodeSelector, affinity, and tolerations for the controller pods
  nodeSelector: {}
  affinity: {}
  tolerations: []



# traefik nested config relates to the traefik Pod and Traefik running within it
# that is acting as a proxy for traffic towards the gateway or user created
# DaskCluster resources.
traefik:
  # Number of instances of the proxy to run
  replicas: 1

  # Any annotations to add to the proxy pods
  annotations: {}

  # Resource requests/limits for the proxy pods
  resources: {}

  # The image to use for the proxy pod
  image:
    name: traefik
    tag: "2.5"
    pullPolicy: IfNotPresent
  imagePullSecrets: []

  # Any additional arguments to forward to traefik
  additionalArguments: []

  # The proxy log level
  loglevel: WARN

  # Whether to expose the dashboard on port 9000 (enable for debugging only!)
  dashboard: false

  # Additional configuration for the traefik service
  service:
    type: LoadBalancer
    annotations: {}
    spec: {}
    ports:
      web:
        # The port HTTP(s) requests will be served on
        port: 80
        nodePort: null
      tcp:
        # The port TCP requests will be served on. Set to `web` to share the
        # web service port
        port: web
        nodePort: null

  # Settings for nodeSelector, affinity, and tolerations for the traefik pods
  nodeSelector: {}
  affinity: {}
  tolerations: []



# rbac nested configuration relates to the choice of creating or replacing
# resources like (Cluster)Role, (Cluster)RoleBinding, and ServiceAccount.
rbac:
  # Whether to enable RBAC.
  enabled: true

  # Existing names to use if ClusterRoles, ClusterRoleBindings, and
  # ServiceAccounts have already been created by other means (leave set to
  # `null` to create all required roles at install time)
  controller:
    serviceAccountName: null

  gateway:
    serviceAccountName: null

  traefik:
    serviceAccountName: null



# global nested configuration is accessible by all Helm charts that may depend
# on each other, but not used by this Helm chart. An entry is created here to
# validate its use and catch YAML typos via this configurations associated JSON
# schema.
global: {}