Install on an HPC Job Queue

Here we provide instructions for installing and configuring dask-gateway-server on a HPC Job Queue system like PBS or Slurm. Note that dask-gateway-server only needs to be installed on one node (typically an edge node).

Currently only PBS and Slurm are supported, but support for additional backends is planned. If this is something you’re interested in, please file an issue.

Create a user account

Before installing anything, you’ll need to create the user account which will be used to run the dask-gateway-server process. The name of the user doesn’t matter, only the permissions they have. Here we’ll use dask:

$ adduser dask

Create install directories

A dask-gateway-server installation has three types of files which will need their own directories created before installation:

  • Software files: This includes a Python environment and all required libraries. Here we use /opt/dask-gateway.

  • Configuration files: Here we use /etc/dask-gateway.

  • Runtime files: Here we use /var/dask-gateway.

# Software files
$ mkdir -p /opt/dask-gateway

# Configuration files
$ mkdir /etc/dask-gateway

# Runtime files
$ mkdir /var/dask-gateway
$ chown dask /var/dask-gateway

Install a python environment

To avoid interactions between the system python installation and dask-gateway-server, we’ll install a full Python environment into the software directory. The easiest way to do this is to use miniconda, but this isn’t a strict requirement.

$ curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o /tmp/miniconda.sh
$ bash /tmp/miniconda.sh -b -p /opt/dask-gateway/miniconda
$ rm /tmp/miniconda.sh

We also recommend adding miniconda to the root user’s path to ease further commands.

$ echo 'export PATH="/opt/dask-gateway/miniconda/bin:$PATH"' >> /root/.bashrc
$ source /root/.bashrc

Install dask-gateway-server

Now we can install dask-gateway-server and its dependencies.

$ conda install -y -c conda-forge dask-gateway-server-jobqueue

Enable permissions for the dask user

For dask-gateway-server to work properly, you’ll need to enable sudo permissions for the dask user account. At a minimum, the dask account will need passwordless permissions to run the dask-gateway-jobqueue-launcher command (installed as part of dask-gateway-server above) as any dask-gateway user. The dask-gateway-jobqueue-launcher script is responsible for launching, tracking, and stopping batch jobs for individual users, and thus needs to be sudo-executed as them for permissions to be transferred appropriately.

An example entry in sudoers might look like:

Cmnd_Alias DASK_GATEWAY_JOBQUEUE_LAUNCHER = /opt/dask-gateway/miniconda/bin/dask-gateway-jobqueue-launcher

dask ALL=(%dask_users,!root) NOPASSWD:DASK_GATEWAY_JOBQUEUE_LAUNCHER

Additionaly, when using PBS you’ll need to make the dask user a PBS Operator:

$ qmgr -c "set server operators += dask@pbs"

Operator level permissions are needed in PBS to allow dask-gateway-server to more efficiently track the status of all users’ jobs.

Configure dask-gateway-server

Now we’re ready to configure our dask-gateway-server installation. Configuration is written as a Python file (typically /etc/dask-gateway/dask_gateway_config.py). Options are assigned to a config object c, which is then loaded by the gateway on startup. You are free to use any python syntax/libraries in this file that you want, the only things that matter to dask-gateway-server are the values set on the c config object.

Here we’ll walk through a few common configuration options you may want to set.

Specify backend

First you’ll need to specify which backend to use by setting c.DaskGateway.backend_class. You have a few options:

  • PBS: dask_gateway_server.backends.jobqueue.pbs.PBSBackend

  • Slurm: dask_gateway_server.backends.jobqueue.slurm.SlurmBackend

For example, here we configure the gateway to use the PBS backend:

# Configure the gateway to use PBS
c.DaskGateway.backend_class = (
    "dask_gateway_server.backends.jobqueue.pbs.PBSBackend"
)

Configure the server addresses (optional)

By default, dask-gateway-server will serve all traffic through 0.0.0.0:8000. This includes both HTTP(S) requests (REST api, dashboards, etc…) and dask scheduler traffic.

If you’d like to serve at a different address, or serve web and scheduler traffic on different ports, you can configure the following fields:

  • c.Proxy.address - Serves HTTP(S) traffic, defaults to :8000.

  • c.Proxy.tcp_address - Serves dask client-to-scheduler tcp traffic, defaults to c.Proxy.address.

Here we configure web traffic to serve on port 8000 and scheduler traffic to serve on port 8001:

c.Proxy.address = ':8000'
c.Proxy.tcp_address = ':8001'

Specify user python environments

Since the Dask workers/schedulers will be running on disparate nodes across the cluster, you’ll need to provide a way for Python environments to be available on every node. You have a few options here:

  • Use a fixed path to a Python environment available on every node

  • Allow users to specify the location of the Python environment (recommended)

In either case, the Python environment requires at least the dask-gateway package be installed to work properly.

Using a fixed environment path

If identical Python environments are available on every node (either local disk, or NFS mount), you only need to configure dask-gateway-server to use the provided Python. This could be done a few different ways:

# Configure the paths to the dask-scheduler/dask-worker CLIs
c.JobQueueClusterConfig.scheduler_cmd = "/path/to/dask-scheduler"
c.JobQueueClusterConfig.worker_cmd = "/path/to/dask-worker"

# OR
# Activate a local conda environment before startup
c.JobQueueClusterConfig.scheduler_setup = 'source /path/to/miniconda/bin/activate /path/to/environment'
c.JobQueueClusterConfig.worker_setup = 'source /path/to/miniconda/bin/activate /path/to/environment'

# OR
# Activate a virtual environment before startup
c.JobQueueClusterConfig.scheduler_setup = 'source /path/to/your/environment/bin/activate'
c.JobQueueClusterConfig.worker_setup = 'source /path/to/your/environment/bin/activate'

User-configurable python environments

Alternatively, you might want to allow users to provide their own Python environments. This can be useful, as it allows users to manage package versions themselves without needing to contact an admin for support.

This can be done by exposing an option for Python environment in c.Backend.cluster_options. Exposing cluster options is discussed in detail in Exposing Cluster Options - here we’ll only provide a short example of one way of accomplishing this. Please see Exposing Cluster Options for more information.

from dask_gateway_server.options import Options, String

def options_handler(options):
    # Fill in environment activation command template with the users
    # provided environment name. This command is then used as the setup
    # script for both the scheduler and workers.
    setup = "source ~/miniconda/bin/activate %s" % options.environment
    return {"scheduler_setup": setup, "worker_setup": setup}

# Provide an option for users to specify the name or location of a
# conda environment to use for both the scheduler and workers.
# If not specified, the default environment of ``base`` is used.
c.Backend.cluster_options = Options(
    String("environment", default="base", label="Conda Environment"),
    handler=options_handler,
)

Additional configuration options

dask-gateway-server has several additional configuration fields. See the Configuration Reference docs (specifically the jobqueue configuration docs) for more information on all available options. At a minimum you’ll probably want to configure the worker resource limits.

# The resource limits for a worker
c.JobQueueClusterConfig.worker_memory = '4 G'
c.JobQueueClusterConfig.worker_cores = 2

If your cluster is under high load (and jobs may be slow to start), you may also want to increase the cluster/worker timeout values:

# Increase startup timeouts to 5 min (600 seconds) each
c.JobQueueClusterBackend.cluster_start_timeout = 600
c.JobQueueClusterBackend.worker_start_timeout = 600

Example

In summary, an example dask_gateway_config.py configuration for PBS might look like:

# Configure the gateway to use PBS as the backend
c.DaskGateway.backend_class = "dask_gateway_server.backends.pbs.PBSBackend"

# Configure the paths to the dask-scheduler/dask-worker CLIs
c.PBSClusterConfig.scheduler_cmd = "~/miniconda/bin/dask-scheduler"
c.PBSClusterConfig.worker_cmd = "~/miniconda/bin/dask-worker"

# Limit resources for a single worker
c.PBSClusterConfig.worker_memory = '4 G'
c.PBSClusterConfig.worker_cores = 2

# Specify the PBS queue to use
c.PBSClusterConfig.queue = 'dask'

# Increase startup timeouts to 5 min (600 seconds) each
c.PBSClusterBackend.cluster_start_timeout = 600
c.PBSClusterBackend.worker_start_timeout = 600

Open relevant ports

For users to access the gateway server, they’ll need access to the public port(s) set in Configure the server addresses (optional) above (by default this is port 8000). How to expose ports is system specific - cluster administrators should determine how best to perform this task.

Start dask-gateway-server

At this point you should be able to start the gateway server as the dask user using your created configuration file. The dask-gateway-server process will be a long running process - how you intend to manage it (supervisord, etc…) is system specific. The requirements are:

  • Start with dask as the user

  • Start with /var/dask-gateway as the working directory

  • Add /opt/dask-gateway/miniconda/bin to path

  • Specify the configuration file location with -f /etc/dask-gateway/dask_gateway_config.py

For ease, we recommend creating a small bash script stored at /opt/dask-gateway/start-dask-gateway to set this up:

#!/usr/bin/env bash

export PATH="/opt/dask-gateway/miniconda/bin:$PATH"
cd /var/dask-gateway
dask-gateway-server -f /etc/dask-gateway/dask_gateway_config.py

For testing here’s how you might start dask-gateway-server manually:

$ cd /var/dask-gateway
$ sudo -iu dask /opt/dask-gateway/start-dask-gateway

Validate things are working

If the server started with no errors, you’ll want to check that things are working properly. The easiest way to do this is to try connecting as a user.

A user’s environment requires the dask-gateway library be installed.

# Install the dask-gateway client library
$ conda create -n dask-gateway -c conda-forge dask-gateway

You can connect to the gateway by creating a dask_gateway.Gateway object, specifying the public address (note that if you configured c.Proxy.tcp_address you’ll also need to specify the proxy_address).

>>> from dask_gateway import Gateway
>>> gateway = Gateway("http://public-address")

You should now be able to make API calls. Try dask_gateway.Gateway.list_clusters(), this should return an empty list.

>>> gateway.list_clusters()
[]

Next, see if you can create a cluster. This may take a few minutes.

>>> cluster = gateway.new_cluster()

The last thing you’ll want to check is if you can successfully connect to your newly created cluster.

>>> client = cluster.get_client()

If everything worked properly, you can shutdown your cluster with dask_gateway.GatewayCluster.shutdown().

>>> cluster.shutdown()