Cluster

So far, we have assumed that you’re logged into a standalone server, and can run applications directly. Now - how to use the fortyfour cluster.

ssh username@fortyfour.ibest.uidaho.edu

Note: Student account users do not need to log into fortyfour - you can submit jobs from the standalone servers. Notice your home directory will look the same

ls

We’re going be creating a bunch of temporary files - so create a new directory in lustre and cd into it (substitute your username) :

mkdir /mnt/lfs2/benji/workshop && cd /mnt/lfs2/benji/workshop

Please to not run computationally intensive jobs on the head node

In order to not run computationally intensive jobs on the head node, also log into a standalone server in another tab/window. If you’re using the classroom cluster - don’t worry about this step.

ssh username@zaphod.ibest.uidaho.edu

To run applications on the cluster, you need to use sbatch. sbatch is the command used for job submission to the cluster. It takes several command line arguments and can also use special directives found in the submission scripts or command file. Several of the most widely used arguments are described in detail here. The best way to use sbatch is to write bash scripts. Example sbatch script:

#!/bin/bash

cd "$SLURM_SUBMIT_DIR"

echo running
source /usr/modules/init/bash
module load R
Rscript -e "rnorm(10)"
sleep 30
echo finished

Submit the job

sbatch rand_nums.slurm

Check to see if it’s running

squeue

Output:

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
  282   primary rand_num     user PD       0:00      1 (None)

The job will produce the output file slurm-<job number>.out which will report the output that would otherwise go to the screen.

benji@fortyfour ~/workshop $ cat slurm-282.out
running
 [1] -1.5552951 -1.8221806  0.5190432 -1.2447830 -0.5147968 -1.5253791
 [7]  0.8816124  0.6505836 -1.2168808 -1.2094903
finished
benji@fortyfour ~/workshop $

Partitions

fortyfour ~ # scontrol show partitions
PartitionName=tiny
   AllowGroups=satellite,hpc AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=06:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=n[067-103],n[107-109]
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO PreemptMode=CANCEL
   State=UP TotalCPUs=776 TotalNodes=40 SelectTypeParameters=NONE
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

PartitionName=short
   AllowGroups=satellite,hpc AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=short
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=1-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=n[067-103],n[107-109]
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO PreemptMode=CANCEL
   State=UP TotalCPUs=776 TotalNodes=40 SelectTypeParameters=NONE
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

PartitionName=volatile
   AllowGroups=gratis AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=5-08:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=n018,n[044-052],n062,n[094-101]
   PriorityJobFactor=1 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO PreemptMode=CANCEL
   State=UP TotalCPUs=224 TotalNodes=19 SelectTypeParameters=NONE
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

PartitionName=testded
   AllowGroups=sysad AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=n018,n062
   PriorityJobFactor=1 PriorityTier=20 RootOnly=NO ReqResv=NO OverSubscribe=NO PreemptMode=CANCEL
   State=UP TotalCPUs=24 TotalNodes=2 SelectTypeParameters=NONE
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

PartitionName=reg
   AllowGroups=satellite,hpc AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=reg
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=7-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=n[001-017],n[019-061],n063,n064,n[067-090],n[094-103],n107
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO PreemptMode=CANCEL
   State=UP TotalCPUs=1244 TotalNodes=97 SelectTypeParameters=NONE
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

PartitionName=long
   AllowGroups=satellite,hpc AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=long
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=n[001-017],n[019-061],n063,n064,n[067-090],n[094-103],n107
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO PreemptMode=CANCEL
   State=UP TotalCPUs=1244 TotalNodes=97 SelectTypeParameters=NONE
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

PartitionName=cmci-gpu
   AllowGroups=cmci AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=n[065-066]
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO PreemptMode=CANCEL
   State=UP TotalCPUs=80 TotalNodes=2 SelectTypeParameters=NONE
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

PartitionName=gpu-short
   AllowGroups=satellite,hpc AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=gpu-short
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=1-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=n104
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO PreemptMode=CANCEL
   State=UP TotalCPUs=8 TotalNodes=1 SelectTypeParameters=NONE
   DefMemPerCPU=8000 MaxMemPerNode=UNLIMITED

PartitionName=gpu-long
   AllowGroups=satellite,hpc AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=gpu-long
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=7-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=n105,n106,n110,n111,n112,n113
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO PreemptMode=CANCEL
   State=UP TotalCPUs=56 TotalNodes=2 SelectTypeParameters=NONE
   DefMemPerCPU=8000 MaxMemPerNode=UNLIMITED

There are now several partitions to pick from, including two gpu partitions (if your job can make use of GPUs). Please choose the partition that most closely matches the requirements of your jobs. If you have a free account (gratis), the only available partition is the ‘volatile’ partition. Jobs in the volatile partition are subject to preemption.

PartitionWall-timeMax-JobsNodes
tiny 6 hoursno-limit 38
short 24 hours 1000 38
reg 168 hours 500 105
long infinite 50 105
gpu-short 24 hours 4 1
gpu-long 168 hours 6 6
volatile 128 hours 50 19

Now let’s use sbatch for aligning RNA-Seq data. First we’ll load the ncbi-sra module:

module load ncbi-sra

Next, get some data from ncbi. Search for “RNA seq”, then filter to Mus Musculus. Use wget to download:

wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/something..something..not..actually..going..to..work../SRR2029055.sra

or better yet, use the SRA toolkit:

prefetch SRR2353195

(if you use the toolkit, the downloaded file is at: ~/ncbi/public/sra/SRR2353195.sra) Or you can just copy the version I downloaded:

cp /mnt/ceph/benji/ncbi/public/sra/SRR2353195.sra ~/workshop/

Use the ncbi-sra toolkit to split the .sra files into fastq (run this one on the standalone & update file paths as appropriate):

cd ..
fastq-dump --outdir workshop --split-files ncbi/public/sra/SRR2353195.sra

Tophat needs an indexed genome to map against - use the one I already downloaded (it’s pretty big). Make an sbatch script (update as appropriate):

#!/bin/bash
#SBATCH -J benji_tophat
#SBATCH -p volatile
#SBATCH --mem=32G

cd "$SLURM_SUBMIT_DIR"

source /usr/modules/init/bash
module load tophat
tophat -p 1 /mnt/ceph/data/Mus/mm10/Sequence/Bowtie2Index/genome SRR2353195_1.fastq

and submit

benji@fortyfour ~/workshop $ sbatch benji_tophat.slurm 
Submitted batch job 290
benji@fortyfour ~/workshop $ squeue
         JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           290   primary benji_to    benji  R       0:13      1 n090

What’s the advantage of running tophat with sbatch in this manner versus on a standalone server? Answer: none really - except that there are many more cluster nodes than there are standalone servers. The advantage of sbatch is when you have a bunch of data/reads or can split up your data.

This tophat run will actually go for quite a while (hours) - so let’s delete the job. First, use squeue to get the job number:

benji@fortyfour ~/workshop $ squeue
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
  290   primary benji_to     user PD       0:00      1 (None)

and now delete it:

scancel 290

Lets split up our sequence file and spread it out across the nodes in the cluster.

split -d -l 2000000 SRR2353195_1.fastq SRR2353195_1.fastq.

This will split the millions reads in this file into several files of 2 million reads each. The -d flag tells split to use decimal indexes, which we will need. We’re going to use a very helpful sbatch feature - arrays. The sbatch file will look like:

#!/bin/bash
#SBATCH -J b_th_a
#SBATCH -p volatile
#SBATCH --mem=150G

cd "$SLURM_SUBMIT_DIR"

#Left-pad with zeros as necessary
FIXED_A_ID=$(printf '%02d' $SLURM_ARRAY_TASK_ID)

echo "Running $SLURM_ARRAY_TASK_ID on $(hostname)"
source /usr/modules/init/bash
module load tophat
tophat -p 1 -o ./tophat_out_$FIXED_A_ID  /mnt/ceph/data/Mus/mm10/Sequence/Bowtie2Index/genome SRR2353195_1.fastq.$FIXED_A_ID

and submit (adjust the number below for how many files you have), and you should be able to see all the jobs that are spawned:

benji@fortyfour ~/workshop $ sbatch -a 0-19 tophat_array.slurm
Submitted batch job 291
benji@fortyfour ~/workshop $ squeue
         JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
   291_[11-19]   primary   b_th_a    benji PD       0:00      1 (Resources)
         291_0   primary   b_th_a    benji  R       0:05      1 n090
         291_1   primary   b_th_a    benji  R       0:05      1 n091
         291_2   primary   b_th_a    benji  R       0:05      1 n092
         291_3   primary   b_th_a    benji  R       0:05      1 n093
         291_4   primary   b_th_a    benji  R       0:05      1 n094
         291_5   primary   b_th_a    benji  R       0:05      1 n095
         291_6   primary   b_th_a    benji  R       0:05      1 n096
         291_7   primary   b_th_a    benji  R       0:05      1 n097
         291_8   primary   b_th_a    benji  R       0:05      1 n098
         291_9   primary   b_th_a    benji  R       0:05      1 n099
        291_10   primary   b_th_a    benji  R       0:05      1 n100

Other sbatch options

Try it out

Create a simple submission file, call it sleep.slurm

#!/bin/sh

for i in `seq 1 60` ; do
        echo $i
        sleep 1
done

Then submit your job with the output file renamed to sleep.log:

sbatch -o sleep.log sleep.slurm

Submit your job with the standard error file renamed:

sbatch -e sleep.log sleep.pbs

Send standard output and standard error to different files:

sbatch -o sleep.log -e sleep.err sleep.pbs

Place the output in another location other than the working directory:

sbatch -o $HOME/tutorials/logs/sleep.log sleep.slurm

Mail job status at the start and end of a job

The mailing options are set using the –mail-type argument. This argument sets the conditions under which the batch server will send a mail message about the job and –mail-user will define the user that emails will be sent to . The conditions for the -m argument include:

FAIL: mail is sent when the job is aborted.
BEGIN: mail is sent when the job begins.
END: main is sent when the job ends.

for example sleep.slurm

#!/bin/bash
#SBATCH -J sleep
#SBATCH -o myoutput.log
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=rlyon@uidaho.edu

for i in `seq 1 30` ; do
     echo "Hello from stdout on $HOSTNAME: $i"
     echo "Hello from stderr on $HOSTNAME: $i" 1>&2
     sleep 1
done

Try it out

Using the sleep.slurm script created earlier, submit a job that emails you for all conditions:

# sbatch sleep.slurm

Submit a job that uses specific resources

For now lets look at the resources requests. By default your job will get 1 thread on one node, but you can request more.

sbatch -N num_nodes -n num_cores sleep.pbs

The -N option is the number of nodes, and -n is the number of processors/cores. Note that just requesting more than one node does not mean that your job will run on more than one node - for this you typically need to use MPI. Some software will attempt to use all cores available on the compute node and it is best to request 16 cores (the most common number of cores present).

A technique that will use an array job to process a bunch of files

Often you will find that you need to run the same task for a bunch of input files, and an array job is a good way to do this. The trick is translating the $SLURM_ARRAY_TASK_ID variable (which is just an integer) into a file name. There are a couple ways to do this. The first is to just name your files with a sequence of integers, eg:

inputfile.1.dat
inputfile.2.dat
inputfile.3.dat
etc...

However, you can keep your data file names using the following technique. Basically we create an index file which translates the file names into integers using a bash array.

Here’s the script to set up the input files, and submit the job

#!/bin/bash

if [ -z $1 ] ; then echo "need to specify a directory"; exit 1; fi

if [ -f test.list ]; then rm test.list ; fi

fcount=0
echo "declare -A files" > test.list
for file in $(ls $1) ; do
    echo "files[$fcount]=$file" >> test.list
    let fcount=fcount+1
done

sbatch -a 0-$fcount testA.slurm -J testA

The slurm script

#!/bin/bash
cd $SLURM_SUBMIT_DIR

source test.list
source /usr/modules/init/bash
module load some_module
some_module_binary ${files[$SLURM_ARRAY_TASK_ID]} > outdir/$(basename ${files[$SLURM_ARRAY_TASK_ID]} ".dat").out

Passing an environment variable to your job

You can pass user defined environment variables to a job by using the –export argument.

Try it out

To test this we will use a simple script that prints out an environment variable. variable.slurm

#!/bin/sh
if [ "x" == "x$MYVAR" ] ; then
     echo "Variable is not set"
else
     echo "Variable says: $MYVAR"
fi

Next use sbatch without the –export and check your standard out file

sbatch variable.pbs

Then use the –export to set the variable

sbatch --export=MYVAR=some_value variable.slurm

MPI

We have several applications compiled for MPI (Message Passing Interface).

Let’s look at an example with ABySS. First, get the test data.

wget http://www.bcgsc.ca/platform/bioinfo/software/abyss/releases/1.3.4/test-data.tar.gz
tar xzvf test-data.tar.gz

then the sbatch script (abyss-mpi.slurm):

#!/bin/sh
#SBATCH -N 2
#SBATCH -p volatile

. /usr/modules/init/bash
module load openmpi-apps/1.10.2/abyss/1.9.0

# The np=? parameter must match the number of nodes allocated in the SBATCH line.
abyss-pe np=2 k=25 name="test" in='test-data/reads1.fastq test-data/reads2.fastq'

Here’s an example with raxml. First copy some data:

wget http://www.hpc.uidaho.edu/example-data/dna.phy

the sbatch script:

#!/bin/bash
#SBATCH -J raxml-mpi-tester
#SBATCH -N 4
#SBATCH -p volatile

source /usr/modules/init/bash
module load openmpi-apps/1.10.2/raxml/8.2.8

cd $SLURM_SUBMIT_DIR

ulimit -l unlimited
mpirun raxmlHPC-MPI -f a -s dna.phy -p 12345 -x 12345 -# 10 -m GTRGAMMA -n T$RANDOM

submit

sbatch raxml-mpi.slurm

Your projects