Teaching GPU cluster Slurm Accounts

Slurm accounts are different from login accounts (Nexus/WatIAM).

  • accounts in Slurm are used to track resource utilization so Slurm can manage limits on certain users or groups of users
  • Slurm accounts in association with Slurm partitions and Slurm QoS objects are used to control/limit access to cluster resourcess.
  • to use Slurm, your Nexus/WatIAM user ID has to be associated with one or more Slurm accounts 

teaching Slurm accounts

Teaching Slurm cluster uses two accounts: normal and gpu-course.

  • normal account: normal account is the default account which means you don't have to use ‑‑account=normal option. By default, Slurm will select normal account. This account will allow access to machines managed under gpu-gen and gpu-k80 partitions.

  • gpu-course account: use the account name gpu-course for the gpu-gtx1080ti partition machines. To access gpu-gtx1080ti partiton machines, use --account=gpu-course and ‑‑partition=gpu-gtx1080ti options to sbatch/srun commands That is you have to use the following two option as part of sbatch script:


    #SBATCH --account=gpu-course
    #SBATCH --partition=gpu-gtx1080ti

Slurm quality of service (QoS) objects are used to set resource limits, job priority and job preemption rules. Slurm account resource limits are set by associating the account to one of the QoS objects. The table below shows currently defined accounts and their respective resource limits.

Account Name Resources Limits
normal
Max jobs 2
CPUs 2
Mem 8GB
gpu 1
Default YES
gpu-course
Max jobs 3
CPUs 8
Mem 16GB
gpu 2
Default NO

Commands for Slurm account info

A Slurm account's resource limit is set using Slurm QoS objects. So, to find out the resource limit set for a Slurm account, you need the QoS name associated with the account. You can get that from the table shown above, or by using the Slurm sacctmgr command. For example, to find the resource limit for the "normal" account, from the table we see that the "normal" QoS is associated with "normal" account, so run the following command:


sacctmgr show qos normal format=Name%15,MaxJobsPU,MaxSubmitjobsPU,MaxTresPU%40