MPI BLAST

Data preparation

If you don’t have your module environment initializing on login, then use the following command to initialize:

# . /usr/module/init/bash

Then load both openmpi and mpi-blast:

# module load openmpi openmpi-apps/1.6.4/blast/1.6.0

Make sure that you have the .ncbirc file located in your home directory. I set mine up like:

[mpiBLAST]  
Shared=/mnt/home/rlyon/db/formatdb  
Local=/mnt/home/rlyon/db/formatdb

Use the mpiformatdb program to format and break up the data into several databases (2 less than the number of nodes you can reserve). The search speed increases linearly depending on the amount of CPUs that you can spread large data set across. I’m using an arbitrary 16 total CPUs for this example so I’m breaking it down into 14 using this script which will be submitted through pbs (the example file gp120.fasta may be found here):

mpiformatdb.pbs (Torque)

#!/bin/sh

# Load in modules  
. /usr/modules/init/bash  
module load openmpi ncbi

cd "$PBS_O_WORKDIR"
mpiformatdb -N 14 -i gp120.fasta

mpiformatdb.slurm (Slurm)

#!/bin/sh

# Load in modules  
. /usr/modules/init/bash  
module load openmpi ncbi

cd "$SLURM_SUBMIT_DIR"
mpiformatdb -N 14 -i gp120.fasta

Note: If you are wanting to blast against our local ncbi blast data you will need to run the fastacmd program to generate the fasta file as such: fastacmd -D 1 -d $BLASTDB/est_mouse > mouse_est or break it up inline with fastacmd -D 1 -d $BLASTDB/est_mouse | mpiformatdb -i stdin -N 16 --skip-reorder -t est_mouse -p F

Submit the job that will break up the file by using the qsub command (or sbatch for Slurm):

# qsub mpiformatdb.pbs

or

# sbatch mpiformatdb.slurm

Once that is complete you will see the following files in your directory which you set for your local files:

rlyon@fourtytwo ~/bio/blast $ ls ~/db/formatdb/  
gp120.fasta.000.phr  gp120.fasta.002.psq  gp120.fasta.005.psi  gp120.fasta.008.psd  gp120.fasta.011.pni  
gp120.fasta.000.pin  gp120.fasta.003.phr  gp120.fasta.005.psq  gp120.fasta.008.psi  gp120.fasta.011.psd  
gp120.fasta.000.pnd  gp120.fasta.003.pin  gp120.fasta.006.phr  gp120.fasta.008.psq  gp120.fasta.011.psi  
gp120.fasta.000.pni  gp120.fasta.003.pnd  gp120.fasta.006.pin  gp120.fasta.009.phr  gp120.fasta.011.psq  
gp120.fasta.000.psd  gp120.fasta.003.pni  gp120.fasta.006.pnd  gp120.fasta.009.pin  gp120.fasta.012.phr  
gp120.fasta.000.psi  gp120.fasta.003.psd  gp120.fasta.006.pni  gp120.fasta.009.pnd  gp120.fasta.012.pin  
gp120.fasta.000.psq  gp120.fasta.003.psi  gp120.fasta.006.psd  gp120.fasta.009.pni  gp120.fasta.012.pnd  
gp120.fasta.001.phr  gp120.fasta.003.psq  gp120.fasta.006.psi  gp120.fasta.009.psd  gp120.fasta.012.pni  
gp120.fasta.001.pin  gp120.fasta.004.phr  gp120.fasta.006.psq  gp120.fasta.009.psi  gp120.fasta.012.psd  
gp120.fasta.001.pnd  gp120.fasta.004.pin  gp120.fasta.007.phr  gp120.fasta.009.psq  gp120.fasta.012.psi  
gp120.fasta.001.pni  gp120.fasta.004.pnd  gp120.fasta.007.pin  gp120.fasta.010.phr  gp120.fasta.012.psq  
gp120.fasta.001.psd  gp120.fasta.004.pni  gp120.fasta.007.pnd  gp120.fasta.010.pin  gp120.fasta.013.phr  
gp120.fasta.001.psi  gp120.fasta.004.psd  gp120.fasta.007.pni  gp120.fasta.010.pnd  gp120.fasta.013.pin  
gp120.fasta.001.psq  gp120.fasta.004.psi  gp120.fasta.007.psd  gp120.fasta.010.pni  gp120.fasta.013.pnd  
gp120.fasta.002.phr  gp120.fasta.004.psq  gp120.fasta.007.psi  gp120.fasta.010.psd  gp120.fasta.013.pni  
gp120.fasta.002.pin  gp120.fasta.005.phr  gp120.fasta.007.psq  gp120.fasta.010.psi  gp120.fasta.013.psd  
gp120.fasta.002.pnd  gp120.fasta.005.pin  gp120.fasta.008.phr  gp120.fasta.010.psq  gp120.fasta.013.psi  
gp120.fasta.002.pni  gp120.fasta.005.pnd  gp120.fasta.008.pin  gp120.fasta.011.phr  gp120.fasta.013.psq  
gp120.fasta.002.psd  gp120.fasta.005.pni  gp120.fasta.008.pnd  gp120.fasta.011.pin  gp120.fasta.mbf  
gp120.fasta.002.psi  gp120.fasta.005.psd  gp120.fasta.008.pni  gp120.fasta.011.pnd  gp120.fasta.pal

Next create the script that you will use to submit the mpiblast job (the example query file cons.fasta may be found here

mpiblast.pbs

#!/bin/sh
#PBS -l nodes=16

# Load in modules  
. /usr/modules/init/bash  
module load openmpi ncbi

cd "$PBS_O_WORKDIR"
mpirun mpiblast -p blastx -d gp120.fasta -i cons.fasta -o blast_results.txt --removedb

mpiblast.slurm

#!/bin/sh
#SBATCH -N 16

# Load in modules  
. /usr/modules/init/bash  
module load openmpi ncbi

cd "$SLURM_SUBMIT_DIR"
ulimit -l unlimited
mpirun mpiblast -p blastx -d gp120.fasta -i cons.fasta -o blast_results.txt --removedb

Note: with mpirun you do not need to specify the number of processors as you did before. OpenMPI has been compiled against the Torque and Slurm libraries and communicates directly with the Torque/Slurm server. Also note that you don’t need to boot the ring; Torque/Slurm automatically handles that as well.

Submit it and check the output files to see if it has worked correctly.

# qsub mpiblast.pbs

or

# sbatch mpiblast.slurm

Exercise 1

Create a generic wrapper script that takes 2 arguments defining the sequence and query files and submits the mpiformatdb job and mpiblast job on those files in a way that the mpiblast job will not run until the mpiformatdb job has completed.

Solution - http://www.ibest.uidaho.edu/wiki/index.php/Solution :_Using_mpiBLAST_-_Exercise_1