News

22.04.2014 Major upgrade of the psk cluster!

The psk cluster is back after upgrade. Summary of biggest changes:

 

 

  1. hostname psk2 changed to psk; /work/psk2 become /work/psk (backup restored, rw access)
  2. lustre filesystem under /work/psk has grown to 140 TB with agregated bandwidth of 2.5 GB/s;
  3. there are 10 new nodes with 16 cpus each (much faster Intel Xeons vs old AMD Opteron)
  4. 4 of 10 new nodes have fast GPU units (GTX Titanium)
  5. Linux distribution: Scientific Linux 6 replaced Centos 5
  6. New developer software & libraries were installed

 

Reminder: contact email about cluster is psk@camk.edu.pl

 

 

More details:

  • Ad. 1 - remember to change hostname & paths in your scripts; sorry for inconvenience but it's just 'cleaner' setup
  • Ad. 1 - /work/psk is available read-write and contains exact copy of /work/psk2; the read-only backup served now as /work/psk2 will be switched off and the backup server will be reinstalled too - it means that there will be no up-to-date backup for around 1 week!
  • Ad. 2 - the quota has been increased depending on the current space occupied by the user, check numbers displayed after login to psk and ask if you need less or more;
  • Ad. 2 - reminder: lustre is made for big files (>10 MB),  single thread can write with speed of ~600 MB/s, but the total bandwidth from all threads on the cluster is ~2.5 GB/s
  • Ad. 2 - there is only 70 TB for backup; to remind: /work/psk is mirrored everyday at 5 am, still - data security is not a priority here, codes and important data should be archived elsewhere
  • Ad. 3 - all new nodes have the same cpus: Intel Xeon 2.4 GHz - this is important for parallel applications; to avoid mixing with old, slow cpus there will be two cluster paritions: new one with 160 cpus (already available, dafault) and old one with ~128 cpus (I still have to configure the queueing system for it); the old partition will be default only for interactive queue or after explicit request via qsub option
  • Ad. 3 - queueing limits like total number of jobs per user were increased and will be tuned over next months
  • ad. 4 - CUDA for GPU nodes will be installed soon (if you are interested in using GPUs let me know)
  • Ad. 5 - Practically both are free versions of Red Hat Enterprise Linux; this is major change in compilers and libraries versions - it is strongly advised to recompile codes
  • Ad. 6 - Newest version of PGI compilers is instaled (pgcc, pgc++, pgf77, pgf95)
  • Ad. 6 - Red Hat Software Collections are installed. It is a framework for running software in versions diffrent than included in the distribution e.g. gcc, python. Type scl -l  to list available packages. SCLs are available on front-end and on nodes. SCLs are not supported by me - please read decumentation at http://linux.web.cern.ch/linux/scl and http://linux.web.cern.ch/linux/devtoolset . Right now there are:
    • Developer Toolset - newest versions of gcc (4.8), gfortran, eclipse, valgrind, etc...; to activate type: 'scl enable devtoolset-2 bash'
    • python 3.3 - to activate type 'scl enable python33 bash'
  • Ad. 6 - MPI users: for now only mvapich2 and openmpi compiled with gnu are available; mpi-selector configuration is not working anymore, instead use modules e.g. 'module avail' , 'module list', 'module load mvapich2-x86_64' (you can add this to the shell init scripts); old /opt/mpiexec/bin/mpiexec will not work - instead use specially recompiled /opt/hydra/bin/mpiexec.hydra process manager; in a few days I will add more detailed info on using MPI to the User guide

To do:

  • upgrade backup server

Future: the cluster can't be financed from grant overheads any more - please contact me (pci@camk.edu.pl) if you can insert a computing nodes or a storage in the grant application!