Overview of specialty machines

The specialty machines controlled by the Slurm workload manager include:

Head node

  • hostname rsubmit.math.private.uwaterloo.ca
  • head node is for job submission and short tasks only, e.g. code compilation, not for executing jobs
# Nodes 1
Node names slurm-pr2-01 (alias rsubmit.math.private)
CPU model (2 per node) Xeon(R) CPU E5-2620 @ 2.00GHz (Sandy Bridge EP)
# Cores per node 12
Threads per core 2
System Memory per node 64 GB
head node hardware specification

CPU clusters

This resource is intended for medium-sized parallel jobs (OpenMP or MPI) and for developing and testing parallel code. Large jobs are better served by the Digital Research Alliance of Canada (formerly Compute Canada).

  • hpc-pr3 cluster
    #Nodes 8
    Node names hpc-pr3-01 to hpc-pr3-08
    CPU model (2 per node) Xeon(R) Gold 6326 2.9 GHz (Ice Lake)
    #Cores per node 32
    Threads per core 2
    System Memory per node 128 GB
    compute node hardware specifications
  • hpc-pr2 cluster
    #Nodes 8
    Node names hpc-pr2-01 to hpc-pr2-08
    CPU model (2 per node) Xeon(R) CPU E5-2630 v2 @ 2.60GHz (Ivy Bridge EP)
    #Cores per node 12
    Threads per core 2
    System Memory per node 64 GB
    compute node hardware specifications 
  • hagrid cluster
    #Nodes 8
    Node names hagrid01 to hagrid08
    CPU model (2 per node) Xeon(R) Silver 4114 CPU @ 2.20GHz
    #Cores per node 10
    Threads per core 1
    System Memory per node 187 GB
    compute node hardware specifications 
  • hagrid cluster
    #Nodes 1
    Node names hagrid-storage
    CPU model (2 per node) Xeon(R) Silver 4114 CPU @ 2.20GHz
    #Cores per node 10
    Threads per core 2
    System Memory per node 187 GB
    storage node hardware specifications 

GPU servers

This resource is intended for GPU-specific computing on a small scale. Medium and large-scale GPU jobs are better served by SHARCNET.

  • research GPU server hardware specifications
    Node name gpu-pr1-01 gpu-pr1-02 gpu-pr1-03
    CPU model (2 per machine) Intel Xeon E5-2680v4 2.4 GHz (Broadwell) AMD EPYC 7542 2.9 GHz Intel Xeon 8480+ 2.0 GHz (Sapphire Rapids)
    # Cores 28 64 56
    Threads per core 1 1 1
    System Memory 128 GB 1024 GB 1024 GB
    GPU Type NVIDIA Tesla P100 NVIDIA Ampere A100 PCI NVIDIA Hopper H100 SXM
    # of GPU devices 4 8 4
    GPU memory per device 16 GB four with 40 GB, four with 80 GB 80 GB

Hybrid InfiniBand cluster - Mosaic

This InfiniBand-connected cluster is intended for GPU, CPU, and parallel jobs on a moderate scale. Anyone in Math (non-CS) may use it, but priority is given to the owners of the cluster. Jobs run by cluster owners will pre-empt job run by other users. Pre-empted job will be resumed as soon as the higher priority job ends. Medium and large-scale GPU jobs are better served by SHARCNET.

There are four classes of machines in Mosaic: GPU, interactive GPU, CPU, interactive CPU. The interactive machines are accessible only by the owners. Below are the specification of mosaic cluster machines.

  • mosaic-gpu machines
    Node names mosaic-01 to mosaic-20
    # Nodes 19 (one node has failed)
    CPU model (2) Xeon(R) CPU E5-2680 v2 @ 2.80GHz (Ivy Bridge EP)
    # Cores 20
    Threads per core 1
    System Memory 256 GB
    GPU Type NVIDIA Tesla K20m
    # of GPU devices 1
    GPU memory per device 5 GB
  • mosaic-cpu machines
    Node names mosaic-21 to mosaic-24
    # Nodes 4
    CPU model (2) Xeon(R) CPU E5-4650 @ 2.70GHz (Sandy Bridge EP)
    # Cores 32
    Threads per core 1
    System Memory 768 GB