Job examples

 

 

General rules & script format

  • Each line begining with #SBATCH is interpreted as sbatch/srun command line option - see  'man sbatch'  for available options.
  • Other lines begining with # are comments ( e.g. ##SBATCH is a comment).
  • All other lines are executed in as in ordinary shell script.
  • The SLURM job is composed of multiple steps. If you do not use srun inside the job script - everything is executed in one step. But in general you can have many subsequent srun commands, each using a subset of allocated resources (see MPI example). Using steps gives much more flexibility and allows to monitor job progress with command sacct.
  • With option '--mail-type=END' , after job completes, you will get an email with a summary of consumed resources - it can help to tighten job limits.
  • Full SLURM documentation is here https://slurm.schedmd.com/

 

SLURM accounts.
To submit a job, user have to be assigned to the SLURM account which limits usage of resources. By default all users are assigned to camk or guest account. In addition, for groups which made substantial financial contribution, there are separate accounts with higher fairshare. The default account is used automatically, to select other account use option '-A'
 

 

Note: all examples below assume bash as a user shell.

 


Serial jobs

A simple script with comments

 

#! /bin/bash -l
## Job name
#SBATCH -J testjob
## Allocate N nodes
#SBATCH -N 1
## ntasks per node (= number of processes = number of cores)
#SBATCH --ntasks-per-node=1
## memory per core
#SBATCH --mem-per-cpu=1GB
## maximum time (HH:MM:SS)
#SBATCH --time=01:00:00
## partition (queue) to use
#SBATCH -p short
## stdout
#SBATCH --output="stdout.txt"
## stderr
#SBATCH --error="stderr.txt"
## Use account (requred only if different from default camk)
#SBATCH -A camk
##send an email when done
#SBATCH --mail-type=END

## jobs starts in $HOME; go to the submission directory
cd $SLURM_SUBMIT_DIR
# run code redirecting stdout and stderr to a file ./my_code >& out.txt

 

Important: Please note that the standard out and err streams from the code are redirected to a file despite the specification of standard out and err for the job. This is very important unless stdout/stderr from your code is less than a few MB. The job output is spooled locally on the execution node and copied to the user working directory only after the job completes. Since the spool size is small (a few GB) you can overfill the disk and crash all the jobs on the node. With redirection approach you avoid this and in addition you can monitor out.txt during runtime.

Array of serial jobs

It is possible to start N copies of a job using option '-a' (array). In the example below we start 100 jobs with id ranging from 0 to 99. The id is available in the job as a shell variable SLURM_ARRAY_TASK_ID and can be used to parametrize the job.

 

#! /bin/bash -l
#SBATCH -J testarr
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=500MB
#SBATCH --time=00:10:00
#SBATCH -p short
#SBATCH --open-mode=append
#SBATCH --output="stdout-%a.txt"
#SBATCH --error="stderr-%a.txt"
#SBATCH -A camk
#SBATCH -a 0-99

cd $SLURM_SUBMIT_DIR
echo "task $SLURM_ARRAY_TASK_ID on host $(/bin/hostname)" >> out-$SLURM_ARRAY_TASK_ID.txt

More sophisticated ranges are possible:

  • -a 0,6,16-32
  • -a 0-15:4    (use step 4, equivalent to -a 0,4,8,12)
  • -a 0-499%4  (500 jobs but only 4 can run simultaneously - please consider this if you have a disk intensive jobs to prevent filesystem slowdown)



Parallel MPI jobs

Parallel jobs must use queue para. SLURM was designed to run primarily parallel jobs so actually there is no need for separate launcher - it is built in srun command. But it supports mpiexec too. In the example below we allocate 8 stask but in the first step only one task is used to compile the code. In the second step we start MPI application on all allocated resources using mechanism built in srun - option --mpi=pmi2 is required!

 

#! /bin/bash -l
#SBATCH -J testmvapich2
#SBATCH -N 2
#SBATCH --ntasks-per-node=4
#SBATCH --mem-per-cpu=500MB
#SBATCH --time=00:10:00
#SBATCH -p para
#SBATCH --output="stdout.txt"
#SBATCH --error="stderr.txt"
#SBATCH -A camk

cd $SLURM_SUBMIT_DIR
module purge
module add mpi/mvapich2-2.2-x86_64
#just a serial task (step)
srun -n 1 mpicc -o mpi-test mpi-test.c
srun --mpi=pmi2 ./mpi-test
# above works for mvapich, if you use openmpi use mpiexec as a launcher
# mpiexec ./mpi-test


Parallel OpenMP jobs

This applies also to Mathematica jobs (however do not use more than 4 threads).

#! /bin/bash -l
#SBATCH -J testopenmp
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 10
#SBATCH --mem-per-cpu=500MB
#SBATCH --time=00:10:00
#SBATCH -p para
#SBATCH --output="stdout.txt"
#SBATCH --error="stderr.txt"
#SBATCH -A camk

cd $SLURM_SUBMIT_DIR
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
time ./compute_pi

Please note the red part: for OpenMP jobs you ask for one task and multiple cpus (threads). It will not work if jou just ask for 10 tasks per node like for MPI job.


Interactive jobs

 

To start an interactive job (e.g. to compile, test, debug, profile the code) use:

srun -p interactive --pty bash

 

GPU jobs

 

GPU cards are available as generic resources. To ask for specific GPU architecture use:

srun -p gpu --gres=gpu:kepler:1 --pty bash

This command starts an interactive session on a gpu partition allocating 1 GPU with kepler architecture. You can verify gpu availablility with command nvidia-smi. For a list of available architectures and node configurations please refer to Hardware page.