Introduction
This page describes how to submit batch jobs. Below are examples for GPU jobs and interactive jobs. Interactive jobs can be submitted using the srun
or salloc
commands. Batch jobs can be submitted using sbatch
.
GPU jobs
For GPU jobs, first, login to tsubmit.math.private.uwaterloo.ca
. Then submit the job to one of the available partitions (e.g. gpu-gtx1080ti
partition). Below are examples for launching python-based and CUDA-based code.
Launching Python GPU code on Slurm
To launch GPU job, first a GPU device has to be requested from from Slurm using the --gres
option. Then create python code that calls GPU functions. In the MFCF teaching Slurm cluster, to run python code, first create a conda environment (as shown in the example below), then activate the conda environment, then run python code.
Launching Python Tensorflow code on Slurm:
#!/bin/bash
#SBATCH --mail-user=<UWuserid>@uwaterloo.ca
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --job-name="mnist-test"
#SBATCH --partition=gpu-gen
#SBATCH --account=normal
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:20:00
#SBATCH --mem=8000
#SBATCH --gres=gpu:gtx1080ti:1
#SBATCH --output=stdout-%j.log
#SBATCH --error=stderr-%j.log
# Set the environment for anaconda3
module load anaconda3/2023.07.1
# set conda environment name
MY_CONDA_ENV="gpuenv1"
# create $MY_CONDA_ENV conda environment..."
conda create --yes --name $MY_CONDA_ENV \
-c anaconda numpy pandas tensorflow-gpu
# activate conda environment
conda activate $MY_CONDA_ENV
# run your code
srun python tf-mnist-gpu.py
# or you can use
# python tf-mnist-gpu.py
Launching Python Pytorch code on Slurm:
#!/bin/bash
#SBATCH --mail-user=<UWuserid>@uwaterloo.ca
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --job-name="gpu-test"
#SBATCH --partition=gpu-gen
#SBATCH --account=normal
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:30:00
#SBATCH --mem=2GB
#SBATCH --gres=gpu:gtx1080ti:1
#SBATCH --output=stdout-%x_%j.log
#SBATCH --error=stderr-%x_%j.log
# Set the environment for anaconda3
module load anaconda3/2023.07.1
# set conda environment name
MY_CONDA_ENV="torchenv2"
# create $MY_CONDA_ENV conda environment...
conda create --yes --name $MY_CONDA_ENV pytorch \
torchvision torchaudio pytorch-cuda=12.1 \
-c pytorch -c nvidia
conda activate $MY_CONDA_ENV
echo "echo: $(which python3) "
echo ""
# put your extra installs here
# using conda command (recommended), e.g.
# conda install --yes skorch
# or using pip command, e.g.
# pip install --yes skorch
srun python3 torch-mnist-gpu.py
Launching CUDA-based GPU code on Slurm
Also request GPU GRES resources using --gres
option. Use the srun command to launch your executable job (e.g. memtestG80
) as shown in the script below
#!/bin/bash
#SBATCH --mail-user=<UWuserid>@uwaterloo.ca
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --job-name=gpu-gen
#SBATCH --account=normal
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --time=01:00:00
#SBATCH --mem-per-cpu=2000
#SBATCH --partition=gpu-k80
#SBATCH --gres=gpu:k80:1
#SBATCH --output=stdout-%j.log
#SBATCH --error=stderr-%j.log
module load cuda/12.0.0
echo "== Start memory test ============"
srun ./memtestG80
Interactive jobs
Interactive jobs give you console access on the compute nodes. You can work as if you are on an interactive node. Interactive jobs (sessions) are useful for jobs that require direct user input. Examples of such situations are as follows:
- compiling your code, especially when the compute node architecture differs from the headnode architecture. For example, it is best to compile a CUDA code on a GPU machine that has similar target GPU device architecture. In this case, you need to create a Slurm interactive session on a GPU compute node.
- Testing and debugging code.
- Running applications on a graphical user interface such as X window.
To launch interactive jobs, use srun with --pty
option. The basic form of this command is:
srun --pty bash -i
Srun's --pty
option runs Slurm's task zero in pseudo terminal mode. Bash's -i option tells it to run in interactive mode. With no resources explicitly specified, this job will run under default Slurm settings: default account, default partition, resource allocation defaults such as number of CPUs, memory size, etc.
Using srun command line options, you can request any resources made available to you. For example, using --gres=
option to request GPU devices or using --nodes=
option to specify number of nodes, etc. You can also specify the partition with --partition=
option and account with --account=
option. The command below demonstrates how to request an interactive session on a resource with two GPU devices. In this case, select gpu-gtx1080ti
partition and 'normal
' account because this combination allows a GPU device to be allocated.
srun --gres=gpu:gtx1080ti:1 \
--partition=gpu-gen \
--account=normal --pty /bin/bash
Note: You may not get an interactive session immediately. After executing an srun command, the job will be queued. You will get an interactive session on a compute node(s) as soon as the requested resources become available.