Slurm offers a wide range of controls for defining and enforcing resource limits, but we use only three: partition, qos and accounts
-
partitions
- set per-job limits such as maximum runtime and maximum number of nodes. Other limits can be set by associating the partition with QOS object.
-
QoS
objects
- set resource limits and priorities and then associated with partition, accounts or jobs
-
accounts
- set limits for all users associate with accounts. Examples for such limits are maximum number of jobs, CPUs, memory, etc. Also limits can be set by associating an account with QOS objects.
As an example, the table below shows all combinations of resources limits for two partitions (gpu-k80, gpu-gtx1080ti) and two accounts: normal and gpu-course accounts.
-
use
--account
and--partition
Slurm commands options to select the Slurm account and partition respectively - it is crucial to know the available resources before submitting your Slurm job. If you exceed the limits set by an account and partition selection, the job submission will fail.
gpu-k80 | gpu-gtx1080ti | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
normal |
| ||||||||||||||||||||
gpu-course |
|
If you exceed the limits prescribed by an account and partition selection, the submission will fail. Therefore, it is critical to know the available resources before submitting your Slurm job.