Introduction
This page describes how to submit batch jobs. Below are examples for GPU jobs and interactive jobs. Interactive jobs can be submitted using the srun or salloc commands. Batch jobs can be submitted using sbatch.
GPU jobs
For GPU jobs, first, login to tsubmit.math.private.uwaterloo.ca. Then submit the job to one of the available partitions (e.g. gpu-gen partition). Below are examples of launching Python-based and CUDA-based code.
Launching Python GPU code on Slurm
To launch a GPU job:
- request a GPU device from Slurm using the
--gresoption - create your python code that calls GPU functions
- create a conda environment (as shown in the example below)
- activate the conda environment
- run your python code
Launching Python Tensorflow code on Slurm:
#!/bin/bash #SBATCH --mail-user=<UWuserid>@uwaterloo.ca #SBATCH --mail-type=BEGIN,END,FAIL #SBATCH --job-name="mnist-test" #SBATCH --partition=gpu-gen #SBATCH --account=normal #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --time=00:20:00 #SBATCH --mem=8000 #SBATCH --gres=gpu:gtx1080ti:1 #SBATCH --output=stdout-%j.log #SBATCH --error=stderr-%j.log # Set the environment for anaconda3 module load anaconda3/2023.07.1 # set conda environment name MY_CONDA_ENV="gpuenv1" # create $MY_CONDA_ENV conda environment..." conda create --yes --name $MY_CONDA_ENV \ -c anaconda numpy pandas tensorflow-gpu # activate conda environment conda activate $MY_CONDA_ENV # run your code srun python tf-mnist-gpu.py # or you can use # python tf-mnist-gpu.py
Launching Python Pytorch code on Slurm:
#!/bin/bash #SBATCH --mail-user=<UWuserid>@uwaterloo.ca #SBATCH --mail-type=BEGIN,END,FAIL #SBATCH --job-name="gpu-test" #SBATCH --partition=gpu-gen #SBATCH --account=normal #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --time=00:30:00 #SBATCH --mem=2GB #SBATCH --gres=gpu:gtx1080ti:1 #SBATCH --output=stdout-%x_%j.log #SBATCH --error=stderr-%x_%j.log # Set the environment for anaconda3 module load anaconda3/2023.07.1 # set conda environment name MY_CONDA_ENV="torchenv2" # create $MY_CONDA_ENV conda environment... conda create --yes --name $MY_CONDA_ENV pytorch \ torchvision torchaudio pytorch-cuda=12.1 \ -c pytorch -c nvidia conda activate $MY_CONDA_ENV echo "echo: $(which python3) " echo "" # put your extra installs here # using conda command (recommended), e.g. # conda install --yes skorch # or using pip command, e.g. # pip install --yes skorch srun python3 torch-mnist-gpu.py
Launching CUDA-based GPU code on Slurm
Also, request GPU GRES resources using the --gres option. Use the srun command to launch your executable job (e.g. memtestG80 ) as shown in the script below.
#!/bin/bash #SBATCH --mail-user=<UWuserid>@uwaterloo.ca #SBATCH --mail-type=BEGIN,END,FAIL #SBATCH --job-name=gpu-gen #SBATCH --partition=gpu-gen #SBATCH --account=normal #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 #SBATCH --time=01:00:00 #SBATCH --mem-per-cpu=2000 #SBATCH --gres=gpu:rtx6000:1 #SBATCH --output=stdout-%j.log #SBATCH --error=stderr-%j.log module load cuda/12.0.0 echo "== Start memory test ============" srun ./memtestG80
Interactive jobs
Interactive jobs give you console access on the compute nodes. You can work as if you are on an interactive node. Interactive jobs (sessions) are useful for jobs that require direct user input. Examples of such situations are as follows:
- Compiling your code, especially when the compute node architecture differs from the headnode architecture. For example, it is best to compile a CUDA code on a GPU machine that has a similar target GPU device architecture. In this case, you need to create a Slurm interactive session on a GPU compute node.
- Testing and debugging code.
- Running applications on a graphical user interface such as X window.
To launch interactive jobs, use the srun command with the --pty option. The basic form of this command is:
srun --pty bash -i
Srun's --pty option runs Slurm's task zero in pseudo-terminal mode. Bash's -i option tells it to run in interactive mode. With no resources explicitly specified, this job will run under default Slurm settings: default account, default partition, resource allocation defaults such as number of CPUs, memory size, etc.
Using srun command line options, you can request any resources made available to you. For example, using the --gres= option to request GPU devices or using the --nodes= option to specify the number of nodes, etc. You can also specify the partition with the --partition= option and the account with the --account= option. The command below demonstrates how to request an interactive session on a resource with two GPU devices. In this case, select the gpu-gen partition and 'normal' account because this combination allows a GPU device to be allocated.
srun --gres=gpu:gtx1080ti:1 \
--partition=gpu-gen \
--account=normal --pty /bin/bash
Note: You may not get an interactive session immediately. After executing a srun command, the job will be queued. You will get an interactive session on a compute node(s) as soon as the requested resources become available.