The specialty machines controlled by the Slurm workload manager include:
- head node
- CPU clusters
- hpc-pr3
- hpc-pr2
- hagrid and hagrid-storage -- for members of the Bauch Lab only
- GPU servers
- Hybrid InfiniBand cluster - mosaic
Head node
- hostname rsubmit.math.private.uwaterloo.ca
- head node is for job submission and short tasks only, e.g. code compilation, not for executing jobs
# Nodes | 1 |
---|---|
Node names | slurm-pr2-01 (alias rsubmit.math.private) |
CPU model (2 per node) | Xeon(R) CPU E5-2620 @ 2.00GHz (Sandy Bridge EP) |
# Cores per node | 12 |
Threads per core | 2 |
System Memory per node | 64 GB |
CPU clusters
This resource is intended for medium-sized parallel jobs (OpenMP or MPI) and for developing and testing parallel code. Large jobs are better served by the Digital Research Alliance of Canada (formerly Compute Canada).
- hpc-pr3 cluster
#Nodes 8 Node names hpc-pr3-01 to hpc-pr3-08 CPU model (2 per node) Xeon(R) Gold 6326 2.9 GHz (Ice Lake) #Cores per node 32 Threads per core 2 System Memory per node 128 GB compute node hardware specifications - hpc-pr2 cluster
#Nodes 8 Node names hpc-pr2-01 to hpc-pr2-08 CPU model (2 per node) Xeon(R) CPU E5-2630 v2 @ 2.60GHz (Ivy Bridge EP) #Cores per node 12 Threads per core 2 System Memory per node 64 GB compute node hardware specifications - hagrid cluster
#Nodes 8 Node names hagrid01 to hagrid08 CPU model (2 per node) Xeon(R) Silver 4114 CPU @ 2.20GHz #Cores per node 10 Threads per core 1 System Memory per node 187 GB compute node hardware specifications - hagrid cluster
#Nodes 1 Node names hagrid-storage CPU model (2 per node) Xeon(R) Silver 4114 CPU @ 2.20GHz #Cores per node 10 Threads per core 2 System Memory per node 187 GB storage node hardware specifications
GPU servers
This resource is intended for GPU-specific computing on a small scale. Medium and large-scale GPU jobs are better served by SHARCNET.
- research GPU server hardware specifications
Node name gpu-pr1-01 gpu-pr1-02 gpu-pr1-03 CPU model (2 per machine) Intel Xeon E5-2680v4 2.4 GHz (Broadwell) AMD EPYC 7542 2.9 GHz Intel Xeon 8480+ 2.0 GHz (Sapphire Rapids) # Cores 28 64 56 Threads per core 1 1 1 System Memory 128 GB 1024 GB 1024 GB GPU Type NVIDIA Tesla P100 NVIDIA Ampere A100 PCI NVIDIA Hopper H100 SXM # of GPU devices 4 8 4 GPU memory per device 16 GB four with 40 GB, four with 80 GB 80 GB
Hybrid InfiniBand cluster - Mosaic
This InfiniBand-connected cluster is intended for GPU, CPU, and parallel jobs on a moderate scale. Anyone in Math (non-CS) may use it, but priority is given to the owners of the cluster. Jobs run by cluster owners will pre-empt job run by other users. Pre-empted job will be resumed as soon as the higher priority job ends. Medium and large-scale GPU jobs are better served by SHARCNET.
There are four classes of machines in Mosaic: GPU, interactive GPU, CPU, interactive CPU. The interactive machines are accessible only by the owners. Below are the specification of mosaic cluster machines.
- mosaic-gpu machines
Node names mosaic-01 to mosaic-20 # Nodes 19 (one node has failed) CPU model (2) Xeon(R) CPU E5-2680 v2 @ 2.80GHz (Ivy Bridge EP) # Cores 20 Threads per core 1 System Memory 256 GB GPU Type NVIDIA Tesla K20m # of GPU devices 1 GPU memory per device 5 GB - mosaic-cpu machines
Node names mosaic-21 to mosaic-24 # Nodes 4 CPU model (2) Xeon(R) CPU E5-4650 @ 2.70GHz (Sandy Bridge EP) # Cores 32 Threads per core 1 System Memory 768 GB