Teaching GPU cluster Slurm partitions

Partitions in Slurm can be considered as a resource abstraction. A partition configuration used to define resource limits and access controls for a group of nodes. Slurm allocates resources to jobs within the selected partition by taking into consideration the resources you request for your job and the partition's available resources and restrictions.

MFCF will adjust partition configurations as we observe usage patterns.

There is one partition in the teaching GPU cluster, gpu-gen, that gives access to all the GPU servers

  • gpu-gen total available resources:
    Partition name gpu-gen
    Total available memory 600 GB
    Max Cores 78 cores
    Threads per core 2 Threads
    GPU devices 8 K80 devices,
    16 GTX1080ti devices, and 5 RTX 6000 Ada devices
    GPU memory per device 12 GB to 48 GB depending on GPU
    Compute Nodes gpu-pt1-01,
    gpu-pt1-02,
    gpu-pt1-03,
    gpu-pt1-04
    gpu-gen partition specifications
  • gpu-gen resource limits:
    Max runtime (h) 12 hour
    Max Nodes 1 Node
    gpu-gen partition limits