Introduction
Slurm's main job submission commands are: sbatch, salloc, and srun.
Note: Slurm does not automatically copy executable or data files to the nodes allocated to a job. The files must exist either on a local disk or in some global file system (e.g. NFS or CIFS). You can also, use sbcast
command to transfer files to local storage on allocated nodes.
Command sbatch
Submit a job script for later execution.
Syntax
sbatch [options] Jobscript [args...]
Options
- Slurm batch job options are usually embedded in a job script prefixed by
#SBATCH
directives - Slurm options can also be passed as command line options or by setting Slurm input environment variables
- options passed on the command line override corresponding options embedded in the job script or environment variable
- see Passing options to Slurm commands section below for more details
Job script
Usually a job script consists of two parts:
- Slurm directives section:
- identified by
#SBATCH
directive at the beginning of each line in this section - where sbatch command line options are set. For reproducibility, use this section (instead of command line or environment variables) to pass sbatch options. For legibility, use long form options.
- identified by
- Job commands section:
- commands in this section are executed in the assigned node resources. It is written in scripting language identified by interpreter directive (e.g. #!/bin/bash).
Example
First create job script (e.g. sbatch-job01.sh)
#!/bin/bash
#SBATCH --mail-user=userid@uwaterloo.ca
#SBATCH --mail-type=end,fail
#SBATCH --partition=cpu_pr3
#SBATCH --job-name="sbatch Example 01"
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=3
#SBATCH --time=00:10:00
#SBATCH --mem-per-cpu=1G
#SBATCH --output=stdout-%j.log
#SBATCH --error=stderr-%j.log
# Your job commands
srun --label hostname
Then Submit the job script using sbatch command:
sbatch sbatch-job01.sh
For more details on sbatch
, read the man page (man sbatch
) or visit the sbatch page on SchedMD website
Command salloc
- allocate resources (e.g. nodes), possibly with a set of constraints (e.g. number of processor per node) for later utilization
- use to allocate resources and spawn a shell, in which the
srun
command is used to launch parallel tasks
Syntax
salloc [options] [command [args...]]
Options
See Passing options to Slurm commands section below for more details.
Example
In this example we use salloc
command to get allocation and create bash
shell. Then use srun
to execute nvidia-smi
command.
$ salloc --partition=gpu_p100 --gres=gpu:p100:1 \
--nodes=1 --ntasks-per-node=3 \
/bin/bash
salloc: Granted job allocation 3071
salloc: Waiting for resource configuration
salloc: Nodes gpu-pr1-01 are ready for job
Now use srun
to run --label nvidia-smi --query-gpu=gpu_name --format=csv,noheader
command on resources allocated by salloc
. The --label
option will prepend lines of output with the remote task id
$ srun --label nvidia-smi \
--query-gpu=gpu_name --format=csv,noheader
2: Tesla P100-PCIE-16GB
0: Tesla P100-PCIE-16GB
1: Tesla P100-PCIE-16GB
To exit salloc
session, use exit
command.
$ exit
exit
salloc: Relinquishing job allocation 3071
salloc: Job allocation 3071 has been revoked.
For more details on salloc
command read the man page (man salloc
) or visit salloc page on SchedMD website
Command srun
- Run a parallel job on cluster managed by Slurm. If run from command line directly,
srun
will first create a resource allocation in which to run the parallel job. srun
can be run from within sbatch script or salloc shell
Syntax
srun [OPTIONS] executable [args...]
Options
See Passing options to Slurm commands section below for more details.
Example
$ srun --partition=gpu_p100 --gres=gpu:p100:1 \
--nodes=1 --ntasks-per-node=3 \
--label nvidia-smi --query-gpu=gpu_name \
--format=csv,noheader
The output of the above example should looks like:
2: Tesla P100-PCIE-16GB
0: Tesla P100-PCIE-16GB
1: Tesla P100-PCIE-16GB
For more details on srun
command read man pages (man srun
) or visit srun page on SchedMD website
Passing options to Slurm commands
- Command options can be passed in the following ways, listed in order of precedence:
- On the command line
- Input environment variables
- In the job script (for sbatch command) prefixed by #SBATCH directive.
The table below shows the most commonly-used options. All of these options can be used with sbatch
command. srun
and salloc
commands support most of these options.
Options for all job submit commands (sbatch, srun and salloc) | |
Option | Description |
---|---|
--mail-user | Mail address to contact job owner. Must specify a valid email address! |
--mail-type | When to notify a job owner: none, all, begin, end, fail, requeue, array_tasks |
--account | Which account to charge. Regular users don't need to specify this option. |
--job-name | Specify a job name |
--time | Expected runtime of the job. Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes". For example: --time 7-12:30 means 7 days, 12 hours and 30 minutes |
--mem-per-cpu | Minimum memory required per allocated CPU in megabytes. Different units can be specified using the suffix [K|M|G] |
--tmp | Specify the amount of disk space that must be available on the compute node(s). The local scratch space for the job is referenced by the variable TMPDIR. Default units are megabytes. Different units can be specified using the suffix [K|M|G|T]. |
--ntasks | Number of tasks (processes). Used for MPI jobs that may run distributed across multiple compute nodes |
--nodes | Request a certain number of nodes |
--ntasks-per-node | Specify how many tasks will run on each allocated node. Meant to be used with --nodes. |
--cpus-per-task | Number of CPUs per task (threads). Used for shared memory jobs that run locally on a single compute node |
--partition | The "all" partition is the default partition. A different partition must be requested with the --partition option! |
--dependency | Defer the start of this job until the specified dependencies have been satisfied. See. man sbatch for a description of all valid dependency types |
--hold | Submit job in hold state. Job is not allowed to run until explicitly released |
--immediate | Only submit the job if all requested resources are immediately available |
--exclusive | Use the compute node(s) exclusively, i.e. do not share nodes with other jobs. CAUTION: Only use this option if you are an experienced user, and you really understand the implications of this feature. If used improperly, the use of this option can lead to a massive waste of computational resources |
--constraint | Request nodes with certain features. This option allows you to request a homogeneous pool of nodes for you MPI job |
Options applying only to sbatch and srun | |
--output | Redirect standard output. All directories specified in the path must exist before the job starts! |
--error | Redirect standard error. All directories specified in the path must exist before the job starts! |
--test-only | Validate the batch script and return the estimated start time considering the current cluster state |
Options applying only to sbatch | |
--array | Submit an array job. Use "%" to specify the max number of tasks allowed to run concurrently. |
--chdir | Set the current working directory. All relative paths used in the job script are relative to this directory. By default, current working directory is the calling process working directory. |
--parsable | Print the job id only |
- below are some of the input environment variables for
sbatch
command and their equivalent command line options salloc
also uses similar environment variables (SALLOC_*). Many ofsalloc
environment variables have ansbatch
equivalent.- for example, SBATCH_ACCOUNT environment variable is equivalent to SALLOC_ACCOUNT variable
srun
environment variables start with SLURM_* (e.g. SLURM_ACCOUNT)
Environment variable | Equivalent option |
---|---|
SBATCH_ACCOUNT | --account |
SBATCH_ACCTG_FREQ | --acctg-freq |
SBATCH_ARRAY_INX | --array |
SBATCH_CLUSTERS or SLURM_CLUSTERS | --clusters |
SBATCH_CONN_TYPE | --conn-type |
SBATCH_CONSTRAINT | --constraint |
SBATCH_CORE_SPEC | --core-spec |
SBATCH_DEBUG | --verbose |
SBATCH_DELAY_BOOT | --delay-boot |
SBATCH_DISTRIBUTION | --distribution |
SBATCH_EXCLUSIVE | --exclusive |
SBATCH_EXPORT | --export |
SBATCH_GEOMETRY | --geometry |
SBATCH_GET_USER_ENV | --get-user-env |
SBATCH_GRES_FLAGS | --gres-flags |
SBATCH_HINT or SLURM_HINT | --hint |
SBATCH_IGNORE_PBS | --ignore-pbs |
SBATCH_IMMEDIATE | --immediate |
SBATCH_IOLOAD_IMAGE | --ioload-image |
SBATCH_JOBID | --jobid |
SBATCH_JOB_NAME | --job-name |
SBATCH_LINUX_IMAGE | --linux-image |
SBATCH_MEM_BIND | --mem-bind |
SBATCH_MLOADER_IMAGE | --mloader-image |
SBATCH_NETWORK | --network |
SBATCH_NO_REQUEUE | --no-requeue |
SBATCH_NO_ROTATE | --no-rotate |
SBATCH_OPEN_MODE | --open-mode |
SBATCH_OVERCOMMIT | --overcommit |
SBATCH_PARTITION | --partition |
SBATCH_POWER | --power |
SBATCH_PROFILE | --profile |
SBATCH_QOS | --qos |
SBATCH_RAMDISK_IMAGE | --ramdisk-image |
SBATCH_RESERVATION | --reservation |
SBATCH_REQUEUE | --requeue |
SBATCH_SIGNAL | --signal |
SBATCH_SPREAD_JOB | --spread-job |
SBATCH_THREAD_SPEC | --thread-spec |
SBATCH_TIMELIMIT | --time |
SBATCH_USE_MIN_NODES | --use-min-nodes |
SBATCH_WAIT | --wait |
SBATCH_WAIT_ALL_NODES | --wait-all-nodes |
SBATCH_WCKEY | --wckey |
SLURM_CONF | The location of the Slurm configuration file |