Quick start guide

"Nothing is better than the wordless teaching and the advantage of non-action"

 

Basics

The cluster consists of the front-end and a number of computational nodes. The user submits a job from the front-end and the queueing system takes care to reserve requested resources and run the job when the time is optimal. An advanced job scheduling will assure fair access for everybody and optimize the overall system efficiency.

 

Access

To use the cluster one has to 'ssh' to it's frontend node - chuck. It is accessible to each user of the CAMK network - there is no separate account. However, to use the cluster you have to contact cluster@camk.edu.pl in order to

  • be assigned to a proper accounting group (usually camk or guest)
  • set up /work/chuck/<user> directory and quota (if needed).

(please write whether you are an employee, guest, student, member of some group and how much space you need - by default employees get 1TB).

 

The frontend can be used to:

  • submit and monitor jobs,
  • develop code (compilation and a few minute long debuging runs).

The frontend is equiped with 20 cpu cores and 64 GB of memory. Please read the messages displayed after login - they contain current, important announcements. Do not run longer or memory hungry codes on the frontend. There are certain limits set ('ulimit -a') and the system will kill proceses which violate them.

 

Storage

The cluster provides a high performance cluster filesystem - BeeGFS:

  • /work/chuck - 550 TB volume, backup done everyday (only previous day mirror).

It should be used for all cluster activities because it's much faster then other 'works' at CAMK. It is also visible on all workstations (but outside of the cluster the access is slower).

Please note that the performance has priority over data safety on /work/chuck! Use it to store simulation/analysis results but not for the only copy of codes, papers, etc.

 

 

Software

Aside from the standard linux packages there is other software available:

  • Software collections (devtoolset including newest gcc, python); to get current list: scl -l
  • MPI implementations - have to be loaded with module command; to list options type: module available (mvapich2-2.2 is recommended)
  • Other, locally compiled software/libraries can be found in /opt directory (fftw3, hdf5, gsl, ...)

 

Compiling codes and running jobs

Chuck uses SLURM queueing system.

A simple compilation can be done directly on the frontend, but if you plan to do more compilations (development/debugging) it is recommended to submit an interactive job and do it on a node. Command

srun --pty bash

will start an interactive bash session on a node.
The default GNU compilers are quite old - it is recommended to use newer versions available as Software Collections, e.g.:

scl enable devtoolset-9 bash

This command opens a new shell and sets all environment variables to use newer compiler (here gcc 9.x).

Warning! Be careful when compiling on chuck (frontend) with option -march=native. The frontend is newer than some nodes and the code compiled this way code will fail on them. To optimize for all nodes use -march=sandybridge. To optimize for newer nodes use -march=native or explicitelly -march=broadwell AND request apriopriate nodes in the job script using option '-C broadwell'. If you don't use -march nor -mtune options the code will run everywhere. Please check hardware section for available options.

 

When the code is compiled you can submit a job. If you are in the interactive job - finnish it by logging out. Prepare a job script like this:

 

#! /bin/bash -l
## job name
#SBATCH -J testjob
## number of nodes
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=1GB
#SBATCH --time=01:00:00
## partition (queue) to use
#SBATCH -p short
#SBATCH --output="stdout.txt"
#SBATCH --error="stderr.txt"

## commands/steps to execute
# go to the submission directory
cd $SLURM_SUBMIT_DIR
hostname > out.txt
my_code >> out.txt

and submit a job with command 'sbatch <script_name>'. Important differences with respect to PBS:

  • job should start with a shell line e.g. #! /bin/bash -l
  • ALL environment variables from the submission shell are passed by defult to the job.

 

Some usefull SLURM commands (for details see relevant man pages):

  • sinfo - list available partitions
  • squeue - list jobs
  • scancel - remove a job
  • sacct - accounting info about completed and running jobs
  • scontrol show partition - details of partitions
  • sshare -la - fairshare records per account and per user


 

Support

Email: cluster©camk.edu.pl 
Please provide the job ID and job script location when applicable.