Know your resource limits

Slurm offers a wide range of controls for defining and enforcing resource limits, but we use only three: partition, qos and accounts

  • partitions
    • set per-job limits such as maximum runtime and maximum number of nodes. Other limits can be set by associating the partition with QOS object.
  • QoS objects
    • set resource limits and priorities and then associated with partition, accounts or jobs
  • accounts
    • set limits for all users associate with accounts. Examples for such limits are maximum number of jobs, CPUs, memory, etc. Also limits can be set by associating an account with QOS objects.

As an example, the table below shows all combinations of resources limits for two partitions (gpu-k80, gpu-gtx1080ti) and two accounts: normal and gpu-course accounts.

  • use --account and --partition Slurm commands options to select the Slurm account and partition respectively
  • it is crucial to know the available resources before submitting your Slurm job.  If you exceed the limits set by an account and partition selection, the job submission will fail.
  gpu-k80 gpu-gtx1080ti  
normal
1 1 Max Nodes
24 hr 24 hr Max Runtime
2 2 Max Jobs
2 2 CPUs
8GB 8GB Memory
1 1 GPU
gpu-course
1 1 Max Nodes
24 hr 24 hr Max Runtime
3 3 Max Jobs
8 8 CPUs
16GB 16GB Memory
2 2 GPU

If you exceed the limits prescribed by an account and partition selection, the submission will fail. Therefore, it is critical to know the available resources before submitting your Slurm job.