Install on a Kubernetes Cluster
Contents
Install on a Kubernetes Cluster¶
Here we provide instructions for installing and configuring
dask-gateway-server
on a Kubernetes Cluster.
Architecture¶
When running on Kubernetes, Dask Gateway is composed of the following components:
Multiple active Dask Clusters (potentially more than one per user)
A Traefik Proxy for proxying both the connection between the user’s client and their respective scheduler, and the Dask Web UI for each cluster
A Gateway API Server that handles user API requests
A Gateway Controller for managing the kubernetes objects used by each cluster (e.g. pods, secrets, etc…).
Both the Traefik Proxy deployment and the Gateway API Server deployment can be scaled to multiple replicas, for increased availability and scalability.
The Dask Gateway pods running on Kubernetes include the following:
api
: The Gateway API Servertraefik
: The Traefik Proxycontroller
: The Kubernetes Gateway Controller for managing Dask-Gateway resourcesscheduler
&worker
: User’s Dask Scheduler and Worker
Network communications happen in the following manner:
The
traefik
pods proxy connections to theapi
pods on port 8000, andscheduler
pods on ports 8786 and 8787.The
api
pods send api requests to thescheduler
pods over port 8788.If using JupyterHub Auth., the
api
pod sends requests to the JupyterHub server to authenticate.Depending on the configuration, requests go directly to the JupyterHub pods through service lookup or through the JupyterHub Proxy.
The
worker
pods communicate with thescheduler
on port 8786.The
traefik
pods proxyworker
communications on port 8787 for the dashboard.The
worker
pods listen for incoming communications on a random high port, which thescheduler
opens connections back to.worker
pods also communicate with each other on these random high ports.The
scheduler
pods send heartbeat requests to theapi
server pods using theapi
service DNS name on port 8000.The
controller
pod only communicates to the Kubernetes API and receives no inbound traffic.
Create a Kubernetes Cluster (optional)¶
If you don’t already have a cluster running, you’ll want to create one. There are plenty of guides online for how to do this. We recommend following the excellent documentation provided by zero-to-jupyterhub-k8s.
Install Helm¶
If you don’t already have Helm installed, you’ll need to install it locally. As with above, there are plenty of instructional materials online for doing this. We recommend following the guide provided by zero-to-jupyterhub-k8s.
Install the Dask-Gateway Helm chart¶
At this point you should have a Kubernetes cluster. You are now ready to install the Dask-Gateway Helm chart on your cluster.
Configuration¶
The Helm chart provides access to configure most aspects of the
dask-gateway-server
. These are provided via a configuration YAML file (the
name of this file doesn’t matter, we’ll use config.yaml
).
The Helm chart exposes many configuration values, see the default values.yaml file for more information.
Install the Helm Chart¶
To install the Dask-Gateway Helm chart, run the following command:
RELEASE=dask-gateway
NAMESPACE=dask-gateway
helm upgrade $RELEASE dask-gateway \
--repo=https://helm.dask.org \
--install \
--namespace $NAMESPACE \
--values path/to/your/config.yaml
where:
RELEASE
is the Helm release name to use (we suggestdask-gateway
, but any release name is fine).NAMESPACE
is the Kubernetes namespace to install the gateway into (we suggestdask-gateway
, but any namespace is fine).path/to/your/config.yaml
is the path to yourconfig.yaml
file created above.
Running this command may take some time, as resources are created and images
are downloaded. When everything is ready, running the following command will
show the EXTERNAL-IP
addresses for the LoadBalancer
service (highlighted
below).
kubectl get service --namespace dask-gateway
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
api-<RELEASE>-dask-gateway ClusterIP 10.51.245.233 <none> 8000/TCP 6m54s
traefik-<RELEASE>-dask-gateway LoadBalancer 10.51.247.160 146.148.58.187 80:30304/TCP 6m54s
You can also check to make sure the daskcluster CRD has been installed successfully:
kubectl get daskcluster -o yaml
apiVersion: v1
items: []
kind: List
metadata:
resourceVersion: ""
selfLink: ""
At this point, you have a fully running dask-gateway-server
.
Connecting to the gateway¶
To connect to the running dask-gateway-server
, you’ll need the external IPs
from the traefik-*
services above. The Traefik service provides access to
API requests, proxies out the Dask Dashboards, and proxies TCP traffic
between Dask clients and schedulers. (You can also choose to have Traefik handle
scheduler traffic over a separate port, see the Helm chart reference).
To connect, create a dask_gateway.Gateway
object, specifying the both
addresses (the second traefik-*
port goes under proxy_address
if using
separate ports). Using the same values as above:
from dask_gateway import Gateway
gateway = Gateway(
"http://146.148.58.187",
)
You should now be able to use the gateway client to make API calls. To verify
this, call dask_gateway.Gateway.list_clusters()
. This should return an
empty list as you have no clusters running yet.
gateway.list_clusters()
Shutting everything down¶
If you’re done with the gateway, you’ll want to delete your deployment and
clean everything up. You can do this with helm delete
:
helm delete $RELEASE
Additional configuration¶
Here we provide a few configuration snippets for common deployment scenarios. For all available configuration fields see the Helm chart reference.
Using a custom image¶
By default schedulers/workers started by dask-gateway will use the
daskgateway/dask-gateway
image. This is a basic image with only the minimal
dependencies installed. To use a custom image, you can configure:
gateway.backend.image.name
: the default image namegateway.backend.image.tag
: the default image tag
For an image to work with dask-gateway, it must have a compatible version of
dask-gateway
installed (we recommend always using the same version as
deployed on the dask-gateway-server
).
Additionally, we recommend using an init process in your images. This isn’t strictly required, but running without an init process may lead to odd worker behaviors. We recommend using tini, but any init process should be fine.
There are no other requirements for images, any image that meets the above should work fine. You may install any additional libraries or dependencies you require.
We encourage you to maintain your own image for scheduler and worker pods as this project only provides a minimal image for testing purposes.
Using extraPodConfig
/extraContainerConfig
¶
The Kubernetes API is large, and not all configuration fields you may want to set on scheduler/worker pods are directly exposed by the Helm chart. To address this, we provide a few fields for forwarding configuration directly to the underlying kubernetes objects:
gateway.backend.scheduler.extraPodConfig
gateway.backend.scheduler.extraContainerConfig
gateway.backend.worker.extraPodConfig
gateway.backend.worker.extraContainerConfig
These allow configuring any unexposed fields on the pod/container for
schedulers and workers respectively. Each takes a mapping of key-value pairs,
which is deep-merged with any settings set by dask-gateway itself (with
preference given to the extra*Config
values). Note that keys should be
camelCase
(rather than snake_case
) to match those in the kubernetes
API.
For example, this can be useful for setting things like tolerations or node affinities on scheduler or worker pods. Here we configure a node anti-affinity for scheduler pods to avoid preemptible nodes:
gateway:
backend:
scheduler:
extraPodConfig:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-preemptible
operator: DoesNotExist
For information on allowed fields, see the Kubernetes documentation:
Using extraConfig
¶
Not all configuration options have been exposed via the helm chart. To set
unexposed options, you can use the gateway.extraConfig
field. This takes
either:
A single python code-block (as a string) to append to the end of the generated
dask_gateway_config.py
file.A map of keys -> code-blocks (recommended). When applied in this form, code-blocks are appended in alphabetical order by key (the keys themselves are meaningless). This allows merging multiple
values.yaml
files together, as Helm can natively merge maps.
For example, here we use gateway.extraConfig
to set
c.Backend.cluster_options
, exposing options for worker
resources and image (see Exposing Cluster Options for more information).
gateway:
extraConfig:
# Note that the key name here doesn't matter. Values in the
# `extraConfig` map are concatenated, sorted by key name.
clusteroptions: |
from dask_gateway_server.options import Options, Integer, Float, String
def option_handler(options):
return {
"worker_cores": options.worker_cores,
"worker_memory": "%fG" % options.worker_memory,
"image": options.image,
}
c.Backend.cluster_options = Options(
Integer("worker_cores", 2, min=1, max=4, label="Worker Cores"),
Float("worker_memory", 4, min=1, max=8, label="Worker Memory (GiB)"),
String("image", default="daskgateway/dask-gateway:latest", label="Image"),
handler=option_handler,
)
For information on all available configuration options, see the Configuration Reference (in particular, the KubeClusterConfig section).
Authenticating with JupyterHub¶
JupyterHub provides a multi-user interactive notebook environment. Through the zero-to-jupyterhub-k8s project, many companies and institutions have setup JuypterHub to run on Kubernetes. When deploying Dask-Gateway alongside JupyterHub, you can configure Dask-Gateway to use JupyterHub for authentication.
Configuring a dask-gateway chart with a jupyterhub chart is more straight forward if they are installed in the same namespace for two reasons. First the JupyterHub chart generates api tokens for registered services and puts them in a k8s Secret that dask-gateway can make use of. Secondly dask-gateway pods/containers can detect the k8s Service from the JupyterHub chart’s resources in the automatically.
If dask-gateway is installed in the same namespace as jupyterhub, this is the recommended configuration to use.
# jupyterhub chart configuration
hub:
services:
dask-gateway:
display: false
Note
The display
attribute hides dask-gateway from the ‘Services’ dropdown in
the JupyterHub home page as dask-gateway doesn’t offer any UI.
# dask-gateway chart configuration
gateway:
auth:
type: jupyterhub
Note
This configuration relies on the dask-gateway chart’s default values of
display.auth.jupyterhub.apiTokenFromSecretName
and
display.auth.jupyterhub.apiTokenFromSecretKey
as can be inspected in the
default values.yaml file.
If dask-gateway isn’t installed in the same namespace as jupyterhub, this is the recommended configuration procedure.
First generate an api token to use, for example using using openssl
:
openssl rand -hex 32
Once you have it, your configuration should look like below, where <API URL>
should look like https://<JUPYTERHUB-HOST>:<JUPYTERHUB-PORT>/hub/api
and
<API TOKEN>
should be the generated api token.
# jupyterhub chart configuration
hub:
services:
dask-gateway:
apiToken: "<API TOKEN>"
display: false
# dask-gateway chart configuration
gateway:
auth:
type: jupyterhub
jupyterhub:
apiToken: "<API TOKEN>"
apiUrl: "<API URL>"
With JupyterHub authentication configured, it can be used to authenticate requests between users of the dask-gateway client and the dask-gateway server running in the api-dask-gateway pod.
Dask-Gateway client users should add auth="jupyterhub"
when they create a
Gateway dask_gateway.Gateway
object, or provide configuration for
dask-gateway client to authenticate with JupyterHub.
from dask_gateway import Gateway
gateway = Gateway(
"http://146.148.58.187",
auth="jupyterhub",
)
Helm chart reference¶
The full default values.yaml file for the dask-gateway Helm chart is included here for reference:
## Provide a name to partially substitute for the full names of resources (will maintain the release name)
##
nameOverride: ""
## Provide a name to substitute for the full names of resources
##
fullnameOverride: ""
# gateway nested config relates to the api Pod and the dask-gateway-server
# running within it, the k8s Service exposing it, as well as the schedulers
# (gateway.backend.scheduler) and workers gateway.backend.worker) created by the
# controller when a DaskCluster k8s resource is registered.
gateway:
# Number of instances of the gateway-server to run
replicas: 1
# Annotations to apply to the gateway-server pods.
annotations: {}
# Resource requests/limits for the gateway-server pod.
resources: {}
# Path prefix to serve dask-gateway api requests under
# This prefix will be added to all routes the gateway manages
# in the traefik proxy.
prefix: /
# The gateway server log level
loglevel: INFO
# The image to use for the dask-gateway-server pod (api pod)
image:
name: ghcr.io/dask/dask-gateway-server
tag: "set-by-chartpress"
pullPolicy:
# Add additional environment variables to the gateway pod
# e.g.
# env:
# - name: MYENV
# value: "my value"
env: []
# Image pull secrets for gateway-server pod
imagePullSecrets: []
# Configuration for the gateway-server service
service:
annotations: {}
auth:
# The auth type to use. One of {simple, kerberos, jupyterhub, custom}.
type: simple
simple:
# A shared password to use for all users.
password:
kerberos:
# Path to the HTTP keytab for this node.
keytab:
jupyterhub:
# A JupyterHub api token for dask-gateway to use. See
# https://gateway.dask.org/install-kube.html#authenticating-with-jupyterhub.
apiToken:
# The JupyterHub Helm chart will automatically generate a token for a
# registered service. If you don't specify an apiToken explicitly as
# required in dask-gateway version <=2022.6.1, the dask-gateway Helm chart
# will try to look for a token from a k8s Secret created by the JupyterHub
# Helm chart in the same namespace. A failure to find this k8s Secret and
# key will cause a MountFailure for when the api-dask-gateway pod is
# starting.
apiTokenFromSecretName: hub
apiTokenFromSecretKey: hub.services.dask-gateway.apiToken
# JupyterHub's api url. Inferred from JupyterHub's service name if running
# in the same namespace.
apiUrl:
custom:
# The full authenticator class name.
class:
# Configuration fields to set on the authenticator class.
config: {}
livenessProbe:
# Enables the livenessProbe.
enabled: true
# Configures the livenessProbe.
initialDelaySeconds: 5
timeoutSeconds: 2
periodSeconds: 10
failureThreshold: 6
readinessProbe:
# Enables the readinessProbe.
enabled: true
# Configures the readinessProbe.
initialDelaySeconds: 5
timeoutSeconds: 2
periodSeconds: 10
failureThreshold: 3
# nodeSelector, affinity, and tolerations the for the `api` pod running dask-gateway-server
nodeSelector: {}
affinity: {}
tolerations: []
# Any extra configuration code to append to the generated `dask_gateway_config.py`
# file. Can be either a single code-block, or a map of key -> code-block
# (code-blocks are run in alphabetical order by key, the key value itself is
# meaningless). The map version is useful as it supports merging multiple
# `values.yaml` files, but is unnecessary in other cases.
extraConfig: {}
# backend nested configuration relates to the scheduler and worker resources
# created for DaskCluster k8s resources by the controller.
backend:
# The image to use for both schedulers and workers.
image:
name: ghcr.io/dask/dask-gateway
tag: "set-by-chartpress"
pullPolicy:
# Image pull secrets for a dask cluster's scheduler and worker pods
imagePullSecrets: []
# The namespace to launch dask clusters in. If not specified, defaults to
# the same namespace the gateway is running in.
namespace:
# A mapping of environment variables to set for both schedulers and workers.
environment: {}
scheduler:
# Any extra configuration for the scheduler pod. Sets
# `c.KubeClusterConfig.scheduler_extra_pod_config`.
extraPodConfig: {}
# Any extra configuration for the scheduler container.
# Sets `c.KubeClusterConfig.scheduler_extra_container_config`.
extraContainerConfig: {}
# Cores request/limit for the scheduler.
cores:
request:
limit:
# Memory request/limit for the scheduler.
memory:
request:
limit:
worker:
# Any extra configuration for the worker pod. Sets
# `c.KubeClusterConfig.worker_extra_pod_config`.
extraPodConfig: {}
# Any extra configuration for the worker container. Sets
# `c.KubeClusterConfig.worker_extra_container_config`.
extraContainerConfig: {}
# Cores request/limit for each worker.
cores:
request:
limit:
# Memory request/limit for each worker.
memory:
request:
limit:
# Number of threads available for a worker. Sets
# `c.KubeClusterConfig.worker_threads`
threads:
# controller nested config relates to the controller Pod and the
# dask-gateway-server running within it that makes things happen when changes to
# DaskCluster k8s resources are observed.
controller:
# Whether the controller should be deployed. Disabling the controller allows
# running it locally for development/debugging purposes.
enabled: true
# Any annotations to add to the controller pod
annotations: {}
# Resource requests/limits for the controller pod
resources: {}
# Image pull secrets for controller pod
imagePullSecrets: []
# The controller log level
loglevel: INFO
# Max time (in seconds) to keep around records of completed clusters.
# Default is 24 hours.
completedClusterMaxAge: 86400
# Time (in seconds) between cleanup tasks removing records of completed
# clusters. Default is 5 minutes.
completedClusterCleanupPeriod: 600
# Base delay (in seconds) for backoff when retrying after failures.
backoffBaseDelay: 0.1
# Max delay (in seconds) for backoff when retrying after failures.
backoffMaxDelay: 300
# Limit on the average number of k8s api calls per second.
k8sApiRateLimit: 50
# Limit on the maximum number of k8s api calls per second.
k8sApiRateLimitBurst: 100
# The image to use for the controller pod.
image:
name: ghcr.io/dask/dask-gateway-server
tag: "set-by-chartpress"
pullPolicy:
# Settings for nodeSelector, affinity, and tolerations for the controller pods
nodeSelector: {}
affinity: {}
tolerations: []
# traefik nested config relates to the traefik Pod and Traefik running within it
# that is acting as a proxy for traffic towards the gateway or user created
# DaskCluster resources.
traefik:
# Number of instances of the proxy to run
replicas: 1
# Any annotations to add to the proxy pods
annotations: {}
# Resource requests/limits for the proxy pods
resources: {}
# The image to use for the proxy pod
image:
name: traefik
tag: "2.10.6"
pullPolicy:
imagePullSecrets: []
# Any additional arguments to forward to traefik
additionalArguments: []
# The proxy log level
loglevel: WARN
# Whether to expose the dashboard on port 9000 (enable for debugging only!)
dashboard: false
# Additional configuration for the traefik service
service:
type: LoadBalancer
annotations: {}
spec: {}
ports:
web:
# The port HTTP(s) requests will be served on
port: 80
nodePort:
tcp:
# The port TCP requests will be served on. Set to `web` to share the
# web service port
port: web
nodePort:
# Settings for nodeSelector, affinity, and tolerations for the traefik pods
nodeSelector: {}
affinity: {}
tolerations: []
# rbac nested configuration relates to the choice of creating or replacing
# resources like (Cluster)Role, (Cluster)RoleBinding, and ServiceAccount.
rbac:
# Whether to enable RBAC.
enabled: true
# Existing names to use if ClusterRoles, ClusterRoleBindings, and
# ServiceAccounts have already been created by other means (leave set to
# `null` to create all required roles at install time)
controller:
serviceAccountName:
gateway:
serviceAccountName:
traefik:
serviceAccountName:
# global nested configuration is accessible by all Helm charts that may depend
# on each other, but not used by this Helm chart. An entry is created here to
# validate its use and catch YAML typos via this configurations associated JSON
# schema.
global: {}