Introduction
Slurm's main job submission commands are: sbatch, salloc, and srun.
Note:
Slurm
does
not
automatically
copy
executable
or
data
files
to
the
nodes
allocated
to
a
job.
The
files
must
exist
either
on
a
local
disk
or
in
some
global
file
system
(e.g.
NFS
or
CIFS).
You
can
also,
use
sbcast
command
to
transfer
files
to
local
storage
on
allocated
nodes.
Command sbatch
Submit a job script for later execution.
Syntax
sbatch [options] Jobscript [args...]
Options
-
Slurm
batch
job
options
are
usually
embedded
in
a
job
script
prefixed
by
#SBATCH
directives - Slurm options can also be passed as command line options or by setting Slurm input environment variables
- options passed on the command line override corresponding options embedded in the job script or environment variable
- see Passing options to Slurm commands section below for more details
Job script
Usually a job script consists of two parts:
-
Slurm
directives
section:
-
identified
by
#SBATCH
directive at the beginning of each line in this section - where sbatch command line options are set. For reproducibility, use this section (instead of command line or environment variables) to pass sbatch options. For legibility, use long form options.
-
identified
by
-
Job
commands
section:
- commands in this section are executed in the assigned node resources. It is written in scripting language identified by interpreter directive (e.g. #!/bin/bash).
Example
First create job script (e.g. job01.sh)
#!/bin/bash #SBATCH --mail-user=userid@uwaterloo.ca #SBATCH --mail-type=end,fail #SBATCH --partition=gpu-k80 #SBATCH --job-name="test-job-01" #SBATCH --nodes=1 #SBATCH --ntasks-per-node=3 #SBATCH --time=00:10:00 #SBATCH --mem-per-cpu=1G #SBATCH --gres=gpu:k80:1 #SBATCH --output=stdout-%j.log #SBATCH --output=stderr-%j.log # Your job commands srun --label nvidia-smi --query-gpu=gpu_name \ --format=csv,noheader
Then Submit the job script using sbatch command:
sbatch job01.sh
For
more
details
on
sbatch
,
read
the
man
page
(man
sbatch
)
or
visit
the
sbatch
page
on
SchedMD
website
Command salloc
- allocate resources (e.g. nodes), possibly with a set of constraints (e.g. number of processor per node) for later utilization
-
use to
allocate
resources
and
spawn
a
shell,
in
which
the
srun
command is used to launch parallel tasks
Syntax
salloc [options] [command [args...]]
Options
See Passing options to Slurm commands section below for more details.
Example
In
this
example
we
use
salloc
command
to
get
allocation
and
create
bash
shell.
Then
use
srun
to
execute
nvidia-smi
command.
$ salloc --partition=gpu-k80 --gres=gpu:k80:1 \ --nodes=1 --ntasks-per-node=3 \ /bin/bash salloc: Granted job allocation 168 salloc: Waiting for resource configuration salloc: Nodes gpu-pt1-01 are ready for job
Now
use
srun
to
run
--label
nvidia-smi
--query-gpu=gpu_name
--format=csv,noheader
command
on
resources
allocated
by
salloc
.
The
--label
option
will
prepend
lines
of
output
with
the
remote
task
id
$ srun --label nvidia-smi \ --query-gpu=gpu_name --format=csv,noheader 2: Tesla K80 1: Tesla K80 0: Tesla K80
To
exit
salloc
session,
use
exit
command.
$ exit exit salloc: Relinquishing job allocation 168 salloc: Job allocation 168 has been revoked.
For
more
details
on
salloc
command read
the
man
page
(man
salloc
)
or
visit
salloc
page
on
SchedMD
website
Command srun
-
Run
a
parallel
job
on
cluster
managed
by
Slurm.
If
run
from
command
line
directly,
srun
will first create a resource allocation in which to run the parallel job. -
srun
can be run from within sbatch script or salloc shell
Syntax
srun [OPTIONS] executable [args...]
Options
See Passing options to Slurm commands section below for more details.
Example
$ srun --partition=gpu-k80 --gres=gpu:k80:1 \ --nodes=1 --ntasks-per-node=3 \ --label nvidia-smi --query-gpu=gpu_name \ --format=csv,noheader
The output of the above example should looks like:
2: Tesla K80 0: Tesla K80 1: Tesla K80
For
more
details
on
srun
command read
man
pages
(man
srun
)
or
visit
srun
page
on
SchedMD
website
Passing options to Slurm commands
-
Command
options
can
be
passed
in
the
following
ways,
listed
in
order
of
precedence:
- On the command line
- Input environment variables
- In the job script (for sbatch command) prefixed by #SBATCH directive.
The
table
below
shows
the
most
commonly-used
options.
All
of
these
options
can
be
used
with
sbatch
command.
srun
and
salloc
commands
support
most
of
these
options.
Options for all job submit commands (sbatch, srun and salloc) | |
Option | Description |
---|---|
--mail-user | Mail address to contact job owner. Must specify a valid email address! |
--mail-type | When to notify a job owner: none, all, begin, end, fail, requeue, array_tasks |
--account | Which account to charge. Regular users don't need to specify this option. |
--job-name | Specify a job name |
--time |
Expected
runtime
of
the
job.
Acceptable
time
formats
include
"minutes",
"minutes:seconds",
"hours:minutes:seconds",
"days-hours",
"days-hours:minutes".
For
example:
--time
7-12:30
means
7
days,
12
hours
and
30
minutes
|
--mem-per-cpu | Minimum memory required per allocated CPU in megabytes. Different units can be specified using the suffix [K|M|G] |
--tmp | Specify the amount of disk space that must be available on the compute node(s). The local scratch space for the job is referenced by the variable TMPDIR. Default units are megabytes. Different units can be specified using the suffix [K|M|G|T]. |
--ntasks | Number of tasks (processes). Used for MPI jobs that may run distributed across multiple compute nodes |
--nodes | Request a certain number of nodes |
--ntasks-per-node | Specify how many tasks will run on each allocated node. Meant to be used with --nodes. |
--cpus-per-task | Number of CPUs per task (threads). Used for shared memory jobs that run locally on a single compute node |
--partition | The "all" partition is the default partition. A different partition must be requested with the --partition option! |
--dependency | Defer the start of this job until the specified dependencies have been satisfied. See. man sbatch for a description of all valid dependency types |
--hold | Submit job in hold state. Job is not allowed to run until explicitly released |
--immediate | Only submit the job if all requested resources are immediately available |
--exclusive | Use the compute node(s) exclusively, i.e. do not share nodes with other jobs. CAUTION: Only use this option if you are an experienced user, and you really understand the implications of this feature. If used improperly, the use of this option can lead to a massive waste of computational resources |
--constraint | Request nodes with certain features. This option allows you to request a homogeneous pool of nodes for you MPI job |
Options applying only to sbatch and srun | |
--output | Redirect standard output. All directories specified in the path must exist before the job starts! |
--error | Redirect standard error. All directories specified in the path must exist before the job starts! |
--test-only | Validate the batch script and return the estimated start time considering the current cluster state |
Options applying only to sbatch | |
--array | Submit an array job. Use "%" to specify the max number of tasks allowed to run concurrently. |
--chdir | Set the current working directory. All relative paths used in the job script are relative to this directory. By default, current working directory is the calling process working directory. |
--parsable | Print the job id only |
-
below
are
some
of
the
input
environment
variables
for
sbatch
command and their equivalent command line options -
salloc
also uses similar environment variables (SALLOC_*). Many ofsalloc
environment variables have ansbatch
equivalent.- for example, SBATCH_ACCOUNT environment variable is equivalent to SALLOC_ACCOUNT variable
-
srun
environment variables start with SLURM_* (e.g. SLURM_ACCOUNT)
Environment variable | Equivalent option |
---|---|
SBATCH_ACCOUNT | --account |
SBATCH_ACCTG_FREQ | --acctg-freq |
SBATCH_ARRAY_INX | --array |
SBATCH_CLUSTERS or SLURM_CLUSTERS | --clusters |
SBATCH_CONN_TYPE | --conn-type |
SBATCH_CONSTRAINT | --constraint |
SBATCH_CORE_SPEC | --core-spec |
SBATCH_DEBUG | --verbose |
SBATCH_DELAY_BOOT | --delay-boot |
SBATCH_DISTRIBUTION | --distribution |
SBATCH_EXCLUSIVE | --exclusive |
SBATCH_EXPORT | --export |
SBATCH_GEOMETRY | --geometry |
SBATCH_GET_USER_ENV | --get-user-env |
SBATCH_GRES_FLAGS | --gres-flags |
SBATCH_HINT or SLURM_HINT | --hint |
SBATCH_IGNORE_PBS | --ignore-pbs |
SBATCH_IMMEDIATE | --immediate |
SBATCH_IOLOAD_IMAGE | --ioload-image |
SBATCH_JOBID | --jobid |
SBATCH_JOB_NAME | --job-name |
SBATCH_LINUX_IMAGE | --linux-image |
SBATCH_MEM_BIND | --mem-bind |
SBATCH_MLOADER_IMAGE | --mloader-image |
SBATCH_NETWORK | --network |
SBATCH_NO_REQUEUE | --no-requeue |
SBATCH_NO_ROTATE | --no-rotate |
SBATCH_OPEN_MODE | --open-mode |
SBATCH_OVERCOMMIT | --overcommit |
SBATCH_PARTITION | --partition |
SBATCH_POWER | --power |
SBATCH_PROFILE | --profile |
SBATCH_QOS | --qos |
SBATCH_RAMDISK_IMAGE | --ramdisk-image |
SBATCH_RESERVATION | --reservation |
SBATCH_REQUEUE | --requeue |
SBATCH_SIGNAL | --signal |
SBATCH_SPREAD_JOB | --spread-job |
SBATCH_THREAD_SPEC | --thread-spec |
SBATCH_TIMELIMIT | --time |
SBATCH_USE_MIN_NODES | --use-min-nodes |
SBATCH_WAIT | --wait |
SBATCH_WAIT_ALL_NODES | --wait-all-nodes |
SBATCH_WCKEY | --wckey |
SLURM_CONF | The location of the Slurm configuration file |