This example assumes you already have a UW user account and a Slurm account on teaching GPU cluster.
It shows how to access and use the teaching GPU cluster, and explains the steps needed to compile Python code and run it as a GPU job. Follow similar procedures for other types of jobs such as OpenMP, MPI, and serial jobs.
The example assumes the source code was developed on a remote machine. The user ID in this example is mathuser.
Copy files (if needed)
Copy the local source file (e.g. mnist-gpu.py) to the cluster under /work/mathuser/demo directory.
$ scp mnist-gpu.py mathuser@tsubmit.math.private.uwaterloo.ca:/work/mathuser/demo
Login to headnode
$ ssh mathuser@cpusubmit.math.private.uwaterloo.ca
Use Nexus credential to login.
Create Slurm batch script
Example of a script (my-py-gpu-job.sh) is:
#!/bin/bash
#SBATCH --mail-user=mathuser@uwaterloo.ca
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --job-name="mnist-test"
#SBATCH --partition=gpu-k80
#SBATCH --account=normal
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:20:00
#SBATCH --mem=8000
#SBATCH --gres=gpu:k80:1
#SBATCH --output=stdout-%j.log
#SBATCH --error=stderr-%j.log
# Set the environment for anaconda3
module load anaconda3/2023.07.1
# set conda environment name
MY_CONDA_ENV="gpuenv1"
#create $MY_CONDA_ENV conda environment..."
conda create --yes --name $MY_CONDA_ENV \
-c anaconda numpy pandas tensorflow-gpu
# activate conda environment
conda activate $MY_CONDA_ENV
# run your code
srun python mnist-gpu.py
# or you can use
# python mnist-gpu.py
Submit Slurm job
$ sbatch my-py-gpu-job.sh
Submitted batch job 54
Note the job ID number to use in other commands.
Check job status
To check job status use squeue with the job ID number:
$ squeue job=54