Slurm resource limits

Putting it all together

Slurm offers a wide range of controls for defining and enforcing resource limits, but we use only two:

  • partitions
    • set per-job limits such as maximum runtime and maximum number of nodes
  • accounts
    • set per-user limits such as maximum number of jobs, CPUs, memory, etc.

As an example, the table below shows all combinations of resources limits for two partitions (gpu_a100, cpu_pr3) and one account (normal account).

  • use --account and --partition Slurm command options to select the Slurm account and partition respectively
  • it is crucial to know the available resources before submitting your Slurm job.  If you exceed the limits set by an account and partition selection, the job submission will fail.
  gpu_a100 cpu_pr3  
normal
1 6 Max Nodes
180 hrs 180 hrs Max Runtime
5 5 Max Jobs
60 60 CPUs
120 GB 120 GB Memory
1 0 GPU