Know your resource limits

Slurm offers a wide range of controls for defining and enforcing resource limits, but we use only three: partition, qos and accounts

partitions
- set per-job limits such as maximum runtime and maximum number of nodes. Other limits can be set by associating the partition with QOS object.
QoS objects
- set resource limits and priorities and then associated with partition, accounts or jobs
accounts
- set limits for all users associate with accounts. Examples for such limits are maximum number of jobs, CPUs, memory, etc. Also limits can be set by associating an account with QOS objects.

As an example, the table below shows all combinations of resources limits for two partitions (gpu-k80, gpu-gtx1080ti) and two accounts: normal and gpu-course accounts.

use --account and --partition Slurm commands options to select the Slurm account and partition respectively
it is crucial to know the available resources before submitting your Slurm job. If you exceed the limits set by an account and partition selection, the job submission will fail.

gpu-k80

gpu-gtx1080ti

normal

1	1	Max Nodes
24 hr	24 hr	Max Runtime
2	2	Max Jobs
2	2	CPUs
8GB	8GB	Memory
1	1	GPU

gpu-course

1	1	Max Nodes
24 hr	24 hr	Max Runtime
3	3	Max Jobs
8	8	CPUs
16GB	16GB	Memory
2	2	GPU

If you exceed the limits prescribed by an account and partition selection, the submission will fail. Therefore, it is critical to know the available resources before submitting your Slurm job.

Departments/Schools

Inquiries

Suggestions