Example of steps to use Math teaching GPU clusters

This example assumes you already have a UW user account and a Slurm account on teaching GPU cluster.

It shows how to access and use the teaching GPU cluster, and explains the steps needed to compile Python code and run it as a GPU job. Follow similar procedures for other types of jobs such as OpenMP, MPI, and serial jobs.

The example assumes the source code was developed on a remote machine. The user ID in this example is mathuser.

Copy files (if needed)

Copy the local source file (e.g. mnist-gpu.py) to the cluster under /work/mathuser/demo directory.

$ scp mnist-gpu.py mathuser@tsubmit.math.private.uwaterloo.ca:/work/mathuser/demo

Login to headnode

$ ssh mathuser@cpusubmit.math.private.uwaterloo.ca

Use Nexus credential to login.

Create Slurm batch script

Example of a script (my-py-gpu-job.sh) is:

#!/bin/bash #SBATCH --mail-user=mathuser@uwaterloo.ca #SBATCH --mail-type=BEGIN,END,FAIL #SBATCH --job-name="mnist-test" #SBATCH --partition=gpu-k80 #SBATCH --account=normal #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --time=00:20:00 #SBATCH --mem=8000 #SBATCH --gres=gpu:k80:1 #SBATCH --output=stdout-%j.log #SBATCH --error=stderr-%j.log # Set the environment for anaconda3 module load anaconda3/2023.07.1 # set conda environment name MY_CONDA_ENV="gpuenv1" #create $MY_CONDA_ENV conda environment..." conda create --yes --name $MY_CONDA_ENV \ -c anaconda numpy pandas tensorflow-gpu # activate conda environment conda activate $MY_CONDA_ENV # run your code srun python mnist-gpu.py # or you can use # python mnist-gpu.py

Submit Slurm job

$ sbatch my-py-gpu-job.sh
Submitted batch job 54

Note the job ID number to use in other commands.

Check job status

To check job status use squeue with the job ID number:

$ squeue job=54