Running batch jobs
Submit Jobs Via The SLURM Queueing System
This is the preferred method of submitting batch jobs to the cluster queueing system and to run jobs interactively.
Most Important Queue Commands:
Here we list the most commonly used queueing commands. If you are migrating from a different scheduling system, this cheat sheet may be useful for you. There also exists a compact two-page overview of the most important commands.
Use The 'Sinfo'
Command To Display Information About Available Resources:
If you use the command without any options, it will display all available partitions. Use the -p
switch to select a specific partition, for instance:
astro06:> sinfo -p astro2 PARTITION AVAIL TIMELIMIT NODES STATE NODELIST astro2 up 10-00:00:0 1 down* node458 astro2 up 10-00:00:0 13 alloc node[454-457,459-462,480-481] astro2 up 10-00:00:0 18 idle node[463-479,482]
The command displays how many nodes in the partition are offline (down
), are busy (alloc
) and how many are still available (idle
)). For each sub-category, a NODELIST
is displayed. The TIMELIMIT
column shows the maximum job duration allowed for the partition in days-hours:minutes:seconds
format. You can find more information about how to use the sinfo command on the official SLURM man pages.
Use The 'Squeue' Command To Display Information About Scheduled Jobs:
astro06:> squeue -astro_long JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 566136 astro_long jobname1 username R 47:22 1 node485 566135 astro_long jobname2 username R 54:22 1 node481
The command displays a table with useful information. Use the JOBID
of your job to modify or cancel a scheduled or already running job (see below). The status column (ST
) shows the state of the queued job, the letters stand for: PD
(pending), R
(running), CA
(cancelled), CG
(completing), CD
(completed), F
(failed), TO
(timeout), and NF
(node failure).
Useful command line switches for squeue
include -u
(or --users
) for only listing jobs that belong to a specific user. You can find more information about how to use the squeue command on the official SLURM man pages.
Use The 'Scancel' Command To Cancel A Scheduled Or Running Job:
astro06:> scancel 566136
You can find more information about how to use the scancel command on the official SLURM man pages.
Use The 'Srun' Command To Run Jobs Interactively:
You can run serial, openMP- or MPI-parallel code interactively using the srun
command. Always make sure to specify the partition to run on via the -p
command line switch. When running an MPI job, you can use the -n
switch to specify the number of MPI tasks that you require. Command line arguments for your program can be passed at the end.
astro06:> srun -p astro_devel -n 20 <executable> [args...]
You can find more information about how to use the srun command on the official SLURM man pages.
Use The 'Sbatch' Command To Queue A Job Via A Submission Script:
astro06:> sbatch [additional options] job-submission-script.sh
You can find more information about how to use the sbatch command on the official SLURM man pages.