Running batch jobs: Difference between revisions

From Tycho
Jump to navigation Jump to search
(Created page with "<h1> Submit Jobs Via The SLURM Queueing System </h1> This is the preferred method of submitting batch jobs to the cluster queueing system and to run jobs interactively. <h2> Most Important Queue Commands: </h2> Here we list the most commonly used queueing commands. If you are migrating from a different scheduling system, this cheat sheet may be useful for you. There also exists a compact two-page overview of the most important commands. <h3> Use The <code>'Sinfo'</code...")
 
No edit summary
 
Line 7: Line 7:
Most Important Queue Commands:
Most Important Queue Commands:
</h2>
</h2>
Here we list the most commonly used queueing commands. If you are migrating from a different scheduling system, this cheat sheet may be useful for you. There also exists a compact two-page overview of the most important commands.
Here we list the most commonly used queueing commands. If you are migrating from a different scheduling system, this [https://slurm.schedmd.com/rosetta.pdf#readme cheat sheet] may be useful for you. There also exists a compact [https://slurm.schedmd.com/pdfs/summary.pdf#readme two-page overview] of the most important commands.


<h3>
<h3>
Line 21: Line 21:
astro2      up 10-00:00:0    18  idle node[463-479,482]
astro2      up 10-00:00:0    18  idle node[463-479,482]
</pre>
</pre>
The command displays how many nodes in the partition are offline (<code>down</code>), are busy (<code>alloc</code>) and how many are still available (<code>idle</code>)). For each sub-category, a <code>NODELIST</code> is displayed. The <code>TIMELIMIT</code> column shows the maximum job duration allowed for the partition in <code>days-hours:minutes:seconds</code> format. You can find more information about how to use the <code>sinfo</code> command on the official SLURM main pages.
The command displays how many nodes in the partition are offline (<code>down</code>), are busy (<code>alloc</code>) and how many are still available (<code>idle</code>)). For each sub-category, a <code>NODELIST</code> is displayed. The <code>TIMELIMIT</code> column shows the maximum job duration allowed for the partition in <code>days-hours:minutes:seconds</code> format. You can find more information about how to use the [https://slurm.schedmd.com/sinfo.html#readme sinfo] command on the official [https://slurm.schedmd.com/man_index.html#readme SLURM man pages].


<h3>
<h3>
Line 34: Line 34:
The command displays a table with useful information. Use the <code>JOBID</code> of your job to modify or cancel a scheduled or already running job (see below). The status column (<code>ST</code>) shows the state of the queued job, the letters stand for: <code>PD</code> (pending), <code>R</code> (running), <code>CA</code> (cancelled), <code>CG</code> (completing), <code>CD</code> (completed), <code>F</code> (failed), <code>TO</code> (timeout), and <code>NF</code> (node failure).  
The command displays a table with useful information. Use the <code>JOBID</code> of your job to modify or cancel a scheduled or already running job (see below). The status column (<code>ST</code>) shows the state of the queued job, the letters stand for: <code>PD</code> (pending), <code>R</code> (running), <code>CA</code> (cancelled), <code>CG</code> (completing), <code>CD</code> (completed), <code>F</code> (failed), <code>TO</code> (timeout), and <code>NF</code> (node failure).  


Useful command line switches for <code>squeue</code> include <code>-u</code> (or <code>--users</code>) for only listing jobs that belong to a specific user. You can find more information about how to use the <code>squeue</code> command on the official SLURM man pages.
Useful command line switches for <code>squeue</code> include <code>-u</code> (or <code>--users</code>) for only listing jobs that belong to a specific user. You can find more information about how to use the [https://slurm.schedmd.com/squeue.html#readme squeue] command on the official [https://slurm.schedmd.com/man_index.html#readme SLURM man pages].


<h3>
<h3>
Line 42: Line 42:
astro06:> scancel 566136
astro06:> scancel 566136
</pre>
</pre>
You can find more information about how to use the scancel command on the official SLURM main pages.
You can find more information about how to use the [https://slurm.schedmd.com/scancel.html#readme scancel] command on the official [https://slurm.schedmd.com/man_index.html#readme SLURM man pages].


<h3>
<h3>
Line 52: Line 52:
astro06:> srun -p astro_devel -n 20 <executable> [args...]
astro06:> srun -p astro_devel -n 20 <executable> [args...]
</pre>
</pre>
You can find more information about how to use the srun command on the official SLURM man pages.
You can find more information about how to use the [https://slurm.schedmd.com/srun.html#readme srun] command on the official [https://slurm.schedmd.com/man_index.html#readme SLURM man pages].


<h3>
<h3>
Line 60: Line 60:
astro06:> sbatch [additional options] job-submission-script.sh
astro06:> sbatch [additional options] job-submission-script.sh
</pre>
</pre>
You can find more information about how to use the sbatch command on the official SLURM man pages.
You can find more information about how to use the [https://slurm.schedmd.com/sbatch.html#readme sbatch] command on the official [https://slurm.schedmd.com/man_index.html#readme SLURM man pages].

Latest revision as of 14:45, 15 November 2023

Submit Jobs Via The SLURM Queueing System

This is the preferred method of submitting batch jobs to the cluster queueing system and to run jobs interactively.

Most Important Queue Commands:

Here we list the most commonly used queueing commands. If you are migrating from a different scheduling system, this cheat sheet may be useful for you. There also exists a compact two-page overview of the most important commands.

Use The 'Sinfo' Command To Display Information About Available Resources:

If you use the command without any options, it will display all available partitions. Use the -p switch to select a specific partition, for instance:

astro06:> sinfo -p astro2
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
astro2       up 10-00:00:0      1  down* node458
astro2       up 10-00:00:0     13  alloc node[454-457,459-462,480-481]
astro2       up 10-00:00:0     18   idle node[463-479,482]

The command displays how many nodes in the partition are offline (down), are busy (alloc) and how many are still available (idle)). For each sub-category, a NODELIST is displayed. The TIMELIMIT column shows the maximum job duration allowed for the partition in days-hours:minutes:seconds format. You can find more information about how to use the sinfo command on the official SLURM man pages.

Use The 'Squeue' Command To Display Information About Scheduled Jobs:

astro06:> squeue -astro_long
 JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
 566136    astro_long jobname1 username  R      47:22      1 node485
 566135    astro_long jobname2 username  R      54:22      1 node481

The command displays a table with useful information. Use the JOBID of your job to modify or cancel a scheduled or already running job (see below). The status column (ST) shows the state of the queued job, the letters stand for: PD (pending), R (running), CA (cancelled), CG (completing), CD (completed), F (failed), TO (timeout), and NF (node failure).

Useful command line switches for squeue include -u (or --users) for only listing jobs that belong to a specific user. You can find more information about how to use the squeue command on the official SLURM man pages.

Use The 'Scancel' Command To Cancel A Scheduled Or Running Job:

astro06:> scancel 566136

You can find more information about how to use the scancel command on the official SLURM man pages.

Use The 'Srun' Command To Run Jobs Interactively:

You can run serial, openMP- or MPI-parallel code interactively using the srun command. Always make sure to specify the partition to run on via the -p command line switch. When running an MPI job, you can use the -n switch to specify the number of MPI tasks that you require. Command line arguments for your program can be passed at the end.

astro06:> srun -p astro_devel -n 20 <executable> [args...]

You can find more information about how to use the srun command on the official SLURM man pages.

Use The 'Sbatch' Command To Queue A Job Via A Submission Script:

astro06:> sbatch [additional options] job-submission-script.sh

You can find more information about how to use the sbatch command on the official SLURM man pages.