Install on a Kubernetes Cluster¶
Here we provide instructions for installing and configuring
dask-gateway-server
on a Kubernetes Cluster.
Create a Kubernetes Cluster (optional)¶
If you don’t already have a cluster running, you’ll want to create one. There are plenty of guides online for how to do this. We recommend following the excellent documentation provided by zero-to-jupyterhub-k8s.
Install Helm (optional)¶
If you don’t already have Helm installed, you’ll need to install it locally,
and ensure tiller
is running on your cluster. As with above, there are
plenty of instructional materials online for doing this. We recommend following
the guide provided by zero-to-jupyterhub-k8s.
Install Dask-Gateway¶
At this point you should have a Kubernetes cluster with Helm installed and configured. You are now ready to install Dask-Gateway on your cluster.
Add the Helm Chart Repository¶
To avoid downloading the chart locally from GitHub, you can use the Dask-Gateway Helm chart repository.
$ helm repo add dask-gateway https://dask.org/dask-gateway-helm-repo/
$ helm repo update
Configuration¶
The Helm chart provides access to configure most aspects of the
dask-gateway-server
. These are provided via a configuration YAML file (the
name of this file doesn’t matter, we’ll use config.yaml
).
At a minimum, you’ll need to set a value for gateway.proxyToken
. This is a
random hex string representing 32 bytes, used as a security token between the
gateway and its proxies. You can generate this using openssl
:
$ openssl rand -hex 32
Write the following into a new file config.yaml
, replacing <RANDOM
TOKEN>
with the output of the previous command above.
gateway:
proxyToken: "<RANDOM TOKEN>"
The Helm chart exposes many more configuration values, see the default values.yaml file for more information.
Install the Helm Chart¶
To install the Dask-Gateway Helm chart, run the following command:
RELEASE=dask-gateway
NAMESPACE=dask-gateway
VERSION=0.6.1
helm upgrade --install \
--namespace $NAMESPACE \
--version $VERSION \
--values path/to/your/config.yaml \
$RELEASE \
dask-gateway/dask-gateway
where:
RELEASE
is the Helm release name to use (we suggestdask-gateway
, but any release name is fine).NAMESPACE
is the Kubernetes namespace to install the gateway into (we suggestdask-gateway
, but any namespace is fine).VERSION
is the Helm chart version to use. To use the latest published version you can omit the--version
flag entirely. See the Helm chart repository for an index of all available versions.path/to/your/config.yaml
is the path to yourconfig.yaml
file created above.
Running this command may take some time, as resources are created and images
are downloaded. When everything is ready, running the following command will
show the EXTERNAL-IP
addresses for all LoadBalancer
services (highlighted
below).
$ kubectl get service --namespace dask-gateway
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
scheduler-api-dask-gateway ClusterIP 10.51.245.233 <none> 8001/TCP 6m54s
scheduler-public-dask-gateway LoadBalancer 10.51.253.105 35.202.68.87 8786:31172/TCP 6m54s
web-api-dask-gateway ClusterIP 10.51.250.11 <none> 8001/TCP 6m54s
web-public-dask-gateway LoadBalancer 10.51.247.160 146.148.58.187 80:30304/TCP 6m54s
At this point, you have a fully running dask-gateway-server
.
Connecting to the gateway¶
To connect to the running dask-gateway-server
, you’ll need the external
IPs from both the web-public-*
and scheduler-public-*
services above.
The web-public-*
service provides access to API requests, and also proxies
out the Dask Dashboards. The scheduler-public-*
service proxies TCP
traffic between Dask clients and schedulers.
To connect, create a dask_gateway.Gateway
object, specifying the both
addresses (the scheduler-proxy-*
IP/port goes under proxy_address
).
Using the same values as above:
>>> from dask_gateway import Gateway
>>> gateway = Gateway(
... "http://146.148.58.187",
... proxy_address="tls://35.202.68.87:8786"
... )
You should now be able to use the gateway client to make API calls. To verify
this, call dask_gateway.Gateway.list_clusters()
. This should return an
empty list as you have no clusters running yet.
>>> gateway.list_clusters()
[]
Shutting everything down¶
When you’re done with the gateway, you’ll want to delete your deployment and
clean everything up. You can do this with helm delete
:
$ helm delete --purge $RELEASE
Additional configuration¶
Here we provide a few configuration snippets for common deployment scenarios. For all available configuration fields see the Helm chart reference.
Using extraPodConfig
/extraContainerConfig
¶
The Kubernetes API is large, and not all configuration fields you may want to set on scheduler/worker pods are directly exposed by the Helm chart. To address this, we provide a few fields for forwarding configuration directly to the underlying kubernetes objects:
gateway.clusterManager.scheduler.extraPodConfig
gateway.clusterManager.scheduler.extraContainerConfig
gateway.clusterManager.worker.extraPodConfig
gateway.clusterManager.worker.extraContainerConfig
These allow configuring any unexposed fields on the pod/container for
schedulers and workers respectively. Each takes a mapping of key-value pairs,
which is deep-merged with any settings set by dask-gateway itself (with
preference given to the extra*Config
values). Note that keys should be
camelCase
(rather than snake_case
) to match those in the kubernetes
API.
For example, this can be useful for setting things like tolerations or node affinities on scheduler or worker pods. Here we configure a node anti-affinity for scheduler pods to avoid preemptible nodes:
gateway:
clusterManager:
scheduler:
extraPodConfig:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-preemptible
operator: DoesNotExist
For information on allowed fields, see the Kubernetes documentation:
Using extraConfig
¶
Not all configuration options have been exposed via the helm chart. To set
unexposed options, you can use the gateway.extraConfig
field. This takes
either:
A python code-block (as a string) to append to the end of the generated
dask_gateway_config.py
file.A map of keys -> code-blocks. When applied in this form, code-blocks are appended in alphabetical order by key (the keys themselves are meaningless). This allows merging multiple
values.yaml
files together, as Helm can natively merge maps.
For example, here we use gateway.extraConfig
to set
c.DaskGateway.cluster_manager_options
, exposing options for worker
resources and image (see Exposing Cluster Options for more information).
gateway:
extraConfig: |
from dask_gateway_server.options import Options, Integer, Float, String
def option_handler(options):
return {
"worker_cores": options.worker_cores,
"worker_memory": "%fG" % options.worker_memory,
"image": options.image,
}
c.DaskGateway.cluster_manager_options = Options(
Integer("worker_cores", 2, min=1, max=4, label="Worker Cores"),
Float("worker_memory", 4, min=1, max=8, label="Worker Memory (GiB)"),
String("image", default="daskgateway/dask-gateway:latest", label="Image"),
handler=option_handler,
)
For information on all available configuration options, see the Configuration Reference (in particular, the KubeClusterManager section).
Authenticating with JupyterHub¶
JupyterHub provides a multi-user interactive notebook environment. Through
the zero-to-jupyterhub-k8s project, many companies and institutions have setup
JuypterHub to run on Kubernetes. When deploying Dask-Gateway alongside
JupyterHub, you can configure Dask-Gateway to use JupyterHub for
authentication. To do this, we register dask-gateway
as a JupyterHub
Service.
First we need to generate an API Token - this is commonly done using
openssl
:
$ openssl rand -hex 32
Then add the following lines to your config.yaml
file:
gateway:
auth:
type: jupyterhub
jupyterhub:
apiToken: "<API TOKEN>"
replacing <API TOKEN>
with the output from above.
If you’re not deploying Dask-Gateway in the same cluster and namespace as
JupyterHub, you’ll also need to specify JupyterHub’s API url. This is usually
of the form https://<JUPYTERHUB-HOST>:<JUPYTERHUB-PORT>/hub/api
. If
JupyterHub and Dask-Gateway are on the same cluster and namespace you can omit
this configuration key, the address will be inferred automatically.
gateway:
auth:
type: jupyterhub
jupyterhub:
apiToken: "<API TOKEN>"
apiUrl: "<API URL>"
You’ll also need to add the following to the config.yaml
file for your
JupyterHub Helm Chart.
hub:
services:
dask-gateway:
apiToken: "<API TOKEN>"
again, replacing <API TOKEN>
with the output from above.
With this configuration, JupyterHub will be used to authenticate requests
between users and the dask-gateway-server
. Note that users will need to add
auth="jupyterhub"
when they create a Gateway dask_gateway.Gateway
object.
>>> from dask_gateway import Gateway
>>> gateway = Gateway(
... "http://146.148.58.187",
... proxy_address="tls://35.202.68.87:8786",
... auth="jupyterhub",
... )
Helm chart reference¶
The full default values.yaml file for the dask-gateway Helm chart is included here for reference:
gateway:
# Annotations to apply to the gateway-server pod.
annotations: {}
# A 32 byte hex-encoded secret token for encrypting cookies.
# Sets `c.DaskGateway.cookie_secret`.
cookieSecret: null
# A 32 byte hex-encoded secret token for authenticating with the proxies.
# Sets `c.WebProxy.auth_token` and `c.SchedulerProxy.auth_token`.
proxyToken: null
# Resource requests/limits for the gateway-server pod.
resources: {}
# The image to use for the gateway-server pod.
image:
name: daskgateway/dask-gateway-server
tag: 0.6.1
pullPolicy: IfNotPresent
auth:
# The auth type to use. One of {dummy, kerberos, jupyterhub, custom}.
type: dummy
dummy:
# A shared password to use for all users.
password: null
kerberos:
# Path to the HTTP keytab for this node.
keytab: null
jupyterhub:
# A JupyterHub api token for dask-gateway to use. See
# https://gateway.dask.org/install-kube.html#authenticating-with-jupyterhub.
apiToken: null
# JupyterHub's api url. Inferred from JupyterHub's service name if running
# in the same namespace.
apiUrl: null
custom:
# The full authenticator class name.
class: null
# Configuration fields to set on the authenticator class.
options: {}
clusterManager:
# Timeout (in seconds) for starting a cluster.
# Sets `c.KubeClusterManager.cluster_start_timeout`.
clusterStartTimeout: null
# Timeout (in seconds) for starting a worker.
# Sets `c.KubeClusterManager.worker_start_timeout`.
workerStartTimeout: null
# The image to use for both schedulers and workers.
image:
name: daskgateway/dask-gateway
tag: 0.6.1
pullPolicy: IfNotPresent
# A mapping of environment variables to set for both schedulers and workers.
environment: null
scheduler:
# Any extra configuration for the scheduler pod. Sets
# `c.KubeClusterManager.scheduler_extra_pod_config`.
extraPodConfig: {}
# Any extra configuration for the scheduler container.
# Sets `c.KubeClusterManager.scheduler_extra_container_config`.
extraContainerConfig: {}
# Cores request/limit for the scheduler.
cores:
request: null
limit: null
# Memory request/limit for the scheduler.
memory:
request: null
limit: null
worker:
# Any extra configuration for the worker pod. Sets
# `c.KubeClusterManager.worker_extra_pod_config`.
extraPodConfig: {}
# Any extra configuration for the worker container. Sets
# `c.KubeClusterManager.worker_extra_container_config`.
extraContainerConfig: {}
# Cores request/limit for each worker.
cores:
request: null
limit: null
# Memory request/limit for each worker.
memory:
request: null
limit: null
# Any extra configuration code to append to the generated `dask_gateway_config.py`
# file. Can be either a single code-block, or a map of key -> code-block
# (code-blocks are run in alphabetical order by key, the key value itself is
# meaningless). The map version is useful as it supports merging multiple
# `values.yaml` files, but is unnecessary in other cases.
extraConfig: {}
schedulerProxy:
# Annotations to apply to the scheduler-proxy pod.
annotations: {}
# Resource requests/limits for the scheduler-proxy pod.
resources: {}
# The image to use for the scheduler-proxy.
image:
name: daskgateway/dask-gateway-server
tag: 0.6.1
pullPolicy: IfNotPresent
# Service configuration for the scheduler-proxy service. See
# https://kubernetes.io/docs/concepts/services-networking/service/ for more
# information.
service:
annotations: {}
type: LoadBalancer
nodePort: null
loadBalancerIP: null
webProxy:
# Annotations to apply to the web-proxy pod.
annotations: {}
# Resource requests/limits for the web-proxy pod.
resources: {}
# The image to use for the web-proxy.
image:
name: daskgateway/dask-gateway-server
tag: 0.6.1
pullPolicy: IfNotPresent
# Service configuration for the web-proxy service. See
# https://kubernetes.io/docs/concepts/services-networking/service/ for more
# information.
service:
annotations: {}
type: LoadBalancer
nodePort: null
loadBalancerIP: null
rbac:
# Whether to enable RBAC.
enabled: true
ingress:
# Whether an Ingress object should be used.
enabled: false
# Annotations to apply to the Ingress.
annotations: {}
# A list of hosts to route requests to the proxy.
hosts: []
# TLS configuration for ingress. See
# https://kubernetes.io/docs/concepts/services-networking/ingress/#tls for more
# information.
tls: []
# The path to use when adding the gateway to the Ingress rules. See
# https://kubernetes.io/docs/concepts/services-networking/ingress/#ingress-rules
# for more information.
path: /