Overview of specialty machines

The specialty machines controlled by the Slurm workload manager include:

head node
CPU clusters
- hpc-pr3
- hpc-pr2
- hagrid and hagrid-storage -- for members of the Bauch Lab only
GPU servers
Hybrid InfiniBand cluster - mosaic
- mosaic-gpu
- mosaic-cpu

Head node

hostname rsubmit.math.private.uwaterloo.ca
head node is for job submission and short tasks only, e.g. code compilation, not for executing jobs

head node hardware specification
# Nodes	1
Node names	slurm-pr2-01 (alias rsubmit.math.private)
CPU model (2 per node)	Xeon(R) CPU E5-2620 @ 2.00GHz (Sandy Bridge EP)
# Cores per node	12
Threads per core	2
System Memory per node	64 GB

CPU clusters

This resource is intended for medium-sized parallel jobs (OpenMP or MPI) and for developing and testing parallel code. Large jobs are better served by the Digital Research Alliance of Canada (formerly Compute Canada).

hpc-pr3 cluster

compute node hardware specifications
#Nodes	8
Node names	hpc-pr3-01 to hpc-pr3-08
CPU model (2 per node)	Xeon(R) Gold 6326 2.9 GHz (Ice Lake)
#Cores per node	32
Threads per core	2
System Memory per node	128 GB

hpc-pr2 cluster

compute node hardware specifications
#Nodes	8
Node names	hpc-pr2-01 to hpc-pr2-08
CPU model (2 per node)	Xeon(R) CPU E5-2630 v2 @ 2.60GHz (Ivy Bridge EP)
#Cores per node	12
Threads per core	2
System Memory per node	64 GB

hagrid cluster

compute node hardware specifications
#Nodes	8
Node names	hagrid01 to hagrid08
CPU model (2 per node)	Xeon(R) Silver 4114 CPU @ 2.20GHz
#Cores per node	10
Threads per core	1
System Memory per node	187 GB

hagrid cluster

storage node hardware specifications
#Nodes	1
Node names	hagrid-storage
CPU model (2 per node)	Xeon(R) Silver 4114 CPU @ 2.20GHz
#Cores per node	10
Threads per core	2
System Memory per node	187 GB

GPU servers

This resource is intended for GPU-specific computing on a small scale. Medium and large-scale GPU jobs are better served by SHARCNET.

research GPU server hardware specifications

Node name	gpu-pr1-01	gpu-pr1-02	gpu-pr1-03
CPU model (2 per machine)	Intel Xeon E5-2680v4 2.4 GHz (Broadwell)	AMD EPYC 7542 2.9 GHz	Intel Xeon 8480+ 2.0 GHz (Sapphire Rapids)
# Cores	28	64	56
Threads per core	1	1	1
System Memory	128 GB	1024 GB	1024 GB
GPU Type	NVIDIA Tesla P100	NVIDIA Ampere A100 PCI	NVIDIA Hopper H100 SXM
# of GPU devices	4	8	4
GPU memory per device	16 GB	four with 40 GB, four with 80 GB	80 GB

Hybrid InfiniBand cluster - Mosaic

This InfiniBand-connected cluster is intended for GPU, CPU, and parallel jobs on a moderate scale. Anyone in Math (non-CS) may use it, but priority is given to the owners of the cluster. Jobs run by cluster owners will pre-empt job run by other users. Pre-empted job will be resumed as soon as the higher priority job ends. Medium and large-scale GPU jobs are better served by SHARCNET.

There are four classes of machines in Mosaic: GPU, interactive GPU, CPU, interactive CPU. The interactive machines are accessible only by the owners. Below are the specification of mosaic cluster machines.

mosaic-gpu machines

Node names	mosaic-01 to mosaic-20
# Nodes	19 (one node has failed)
CPU model (2)	Xeon(R) CPU E5-2680 v2 @ 2.80GHz (Ivy Bridge EP)
# Cores	20
Threads per core	1
System Memory	256 GB
GPU Type	NVIDIA Tesla K20m
# of GPU devices	1
GPU memory per device	5 GB

mosaic-cpu machines
Node names mosaic-21 to mosaic-24
# Nodes 4
CPU model (2) Xeon(R) CPU E5-4650 @ 2.70GHz (Sandy Bridge EP)
# Cores 32
Threads per core 1
System Memory 768 GB

Head node

CPU clusters

GPU servers

Hybrid InfiniBand cluster - Mosaic

Departments/Schools

Inquiries

Suggestions

Node names	mosaic-21 to mosaic-24
# Nodes	4
CPU model (2)	Xeon(R) CPU E5-4650 @ 2.70GHz (Sandy Bridge EP)
# Cores	32
Threads per core	1
System Memory	768 GB