BLAST

To use BLAST, load the ncbi-blast module:

benji@marvin2 ~ * module load ncbi-blast
benji@marvin2 ~ * blastn -h
USAGE
  blastn [-h] [-help] [-import_search_strategy filename]
    [-export_search_strategy filename] [-task task_name] [-db database_name]
    [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
    [-negative_gilist filename] [-entrez_query entrez_query]
    [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
    [-subject subject_input_file] [-subject_loc range] [-query input_file]
    [-out output_file] [-evalue evalue] [-word_size int_value]
    [-gapopen open_penalty] [-gapextend extend_penalty]
    [-perc_identity float_value] [-qcov_hsp_perc float_value]
    [-max_hsps int_value] [-xdrop_ungap float_value] [-xdrop_gap float_value]
    [-xdrop_gap_final float_value] [-searchsp int_value]
    [-sum_stats bool_value] [-penalty penalty] [-reward reward] [-no_greedy]
    [-min_raw_gapped_score int_value] [-template_type type]
    [-template_length int_value] [-dust DUST_options]
    [-filtering_db filtering_database]
    [-window_masker_taxid window_masker_taxid]
    [-window_masker_db window_masker_db] [-soft_masking soft_masking]
    [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]
    [-best_hit_score_edge float_value] [-window_size int_value]
    [-off_diagonal_range int_value] [-use_index boolean] [-index_name string]
    [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines]
    [-outfmt format] [-show_gis] [-num_descriptions int_value]
    [-num_alignments int_value] [-line_length line_length] [-html]
    [-max_target_seqs num_sequences] [-num_threads int_value] [-remote]
    [-version]

DESCRIPTION
   Nucleotide-Nucleotide BLAST 2.6.0+

Use '-help' to print detailed descriptions of command line arguments
benji@marvin2 ~ *

Databases

On our newest standalone servers (marvin, trillian, zaphod, ford, formic) we have a local BLAST database that will be regularly updated. Historically, we have maintained a BLAST database at ls /data/blast/db, but fear of updating this database while someone is using it has hampered regular updates. Our new solution involves using file system snapshots so that the CRC can update the BLAST database from NCBI regularly. To use the local BLAST database run the blastdblink command. For example:

benji@marvin2 ~ * blastdblink 
Creating symbolic link from blastdb to autosnap_2018-08-10_15:10:48_monthly ...

Now there is a link to a snapshot of the NCBI BLASTDB (nr, nt for now) in the current directory:

benji@marvin2 ~ * ls -lh
total 12G
-rw-r--r-- 1 benji sysad 2.8M Aug 28 11:37 abyss_test.tar.bz2
-rw-r--r-- 1 benji sysad   27 May 23 09:39 anewfile
lrwxrwxrwx 1 benji sysad   68 Aug 30 10:06 blastdb -> /zdata/blastdb/.zfs/snapshot/autosnap_2018-08-10_15:10:48_monthly/db
-rw-r--r-- 1 benji sysad  209 Jan 30  2018 sleep.slurm
-rw-r--r-- 1 benji sysad   90 Jan 30  2018 slurm-322020.out
-rw-r--r-- 1 benji sysad  263 Feb  2  2018 slurm-324548.out
-rw-r--r-- 1 benji sysad 1.4K Feb  2  2018 slurm-324549.out
drwxr-xr-x 1 benji sysad   87 Mar 16  2017 workshop

Other options for the blastdblink command include choosing which snapshot to link, and unlinking

benji@marvin2 ~ * blastdblink -h
 Creates a link to a BLAST Database directory stored in ZFS 
 Usage: 
   blastdblink -h               Display this help message 
   blastdblink                  Creates a symbolic link in the current directory named blastdb
                                to the most recent BLASTDB available 
   blastdblink -d <link_name>   Creates a symbolic link at the path <link_name> to the most  
                                recent BLASTDB available 
   blastdblink -b <date>        Creates a symbolic link in the current directory named blastdb
                                to a BLASTDB from date <date> (if  available)
   blastdblink -l               List the BLASTDBs available
   blastdblink -u               Deletes the symbolic link to the BLAST database

Use the linked database

benji@marvin2 ~ * blastn -db blastdb/nt -query heuchera.fasta -max_target_seqs=100 -evalue=0.0001 -outfmt "10 staxid ssciname std" -num_threads 20 -out heuchera.blast.out

When you’re done

benji@marvin2 ~ * blastdblink -u
attempting to remove symbolic link
Are you sure you want me to remove the link blastdb ? y

New NCBI version 5 databases

The ncbi-blast version 2.9.0 module now supports NCBI BLAST version 5 databases. You can use the version 5 databases on the following servers:

  • formic.ibest.uidaho.edu

and the cluster nodes with the ‘blastdbv5’ feature (currently n108, n109, n117).

Use the blastdblink5 command (same options as above) to create a symbolic link to the version 5 databases. Remember to create the link somewhere under /mnt/lfs2// when running jobs on the cluster. You will only need to do this once from a standalone (formic) - not in your sbatch script. When you submit your job to the cluster add the constraint like this:

sbatch -C blastdbv5 myscript.slurm