Clustal Omega

Clustal Omega is a multiple sequence alignment program that uses seeded guide trees and HMM profile-profile techniques to generate alignments between three or more sequences.

website

The Clustal Omega authors claim it to be superior to ClustalW.

Usage

module load clustalo
clustalo -i my.input.fasta -o aligned.fasta

Command line options

ibestadm@fortytwo ~ $ clustalo --help
Clustal Omega - 1.2.0 (AndreaGiacomo)

If you like Clustal-Omega please cite:
 Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Soding J, Thompson JD, Higgins DG.
 Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.
 Mol Syst Biol. 2011 Oct 11;7:539. doi: 10.1038/msb.2011.75. PMID: 21988835.
If you don't like Clustal-Omega, please let us know why (and cite us anyway).

Check http://www.clustal.org for more information and updates.

Usage: clustalo [-hv] [-i {<file>,-}] [--hmm-in=<file>]... [--dealign] [--profile1=<file>] [--profile2=<file>] [--is-profile] [-t {Protein, RNA, DNA}] [--infmt={a2m=fa[sta],clu[stal],msf,phy[lip],selex,st[ockholm],vie[nna]}] [--distmat-in=<file>] [--distmat-out=<file>] [--guidetree-in=<file>] [--guidetree-out=<file>] [--full] [--full-iter] [--cluster-size=<n>] [--clustering-out=<file>] [--use-kimura] [--percent-id] [-o {file,-}] [--outfmt={a2m=fa[sta],clu[stal],msf,phy[lip],selex,st[ockholm],vie[nna]}] [--residuenumber] [--wrap=<n>] [--output-order={input-order,tree-order}] [--iterations=<n>] [--max-guidetree-iterations=<n>] [--max-hmm-iterations=<n>] [--maxnumseq=<n>] [--maxseqlen=<l>] [--auto] [--threads=<n>] [-l <file>] [--version] [--long-version] [--force] [--MAC-RAM=<n>]

A typical invocation would be: clustalo -i my-in-seqs.fa -o my-out-seqs.fa -v
See below for a list of all options.

Sequence Input:
  -i, --in, --infile={<file>,-} Multiple sequence input file (- for stdin)
  --hmm-in=<file>           HMM input files
  --dealign                 Dealign input sequences
  --profile1, --p1=<file>   Pre-aligned multiple sequence file (aligned columns will be kept fix)
  --profile2, --p2=<file>   Pre-aligned multiple sequence file (aligned columns will be kept fix)
  --is-profile              disable check if profile, force profile (default no)
  -t, --seqtype={Protein, RNA, DNA} Force a sequence type (default: auto)
  --infmt={a2m=fa[sta],clu[stal],msf,phy[lip],selex,st[ockholm],vie[nna]} Forced sequence input file format (default: auto)

Clustering:
  --distmat-in=<file>       Pairwise distance matrix input file (skips distance computation)
  --distmat-out=<file>      Pairwise distance matrix output file
  --guidetree-in=<file>     Guide tree input file (skips distance computation and guide-tree clustering step)
  --guidetree-out=<file>    Guide tree output file
  --full                    Use full distance matrix for guide-tree calculation (might be slow; mBed is default)
  --full-iter               Use full distance matrix for guide-tree calculation during iteration (might be slowish; mBed is default)
  --cluster-size=<n>        soft maximum of sequences in sub-clusters
  --clustering-out=<file>   Clustering output file
  --use-kimura              use Kimura distance correction for aligned sequences (default no)
  --percent-id              convert distances into percent identities (default no)

Alignment Output:
  -o, --out, --outfile={file,-} Multiple sequence alignment output file (default: stdout)
  --outfmt={a2m=fa[sta],clu[stal],msf,phy[lip],selex,st[ockholm],vie[nna]} MSA output file format (default: fasta)
  --residuenumber, --resno  in Clustal format print residue numbers (default no)
  --wrap=<n>                number of residues before line-wrap in output
  --output-order={input-order,tree-order} MSA output orderlike in input/guide-tree

Iteration:
  --iterations, --iter=<n>  Number of (combined guide-tree/HMM) iterations
  --max-guidetree-iterations=<n> Maximum number guidetree iterations
  --max-hmm-iterations=<n>  Maximum number of HMM iterations

Limits (will exit early, if exceeded):
  --maxnumseq=<n>           Maximum allowed number of sequences
  --maxseqlen=<l>           Maximum allowed sequence length

Miscellaneous:
  --auto                    Set options automatically (might overwrite some of your options)
  --threads=<n>             Number of processors to use
  -l, --log=<file>          Log all non-essential output to this file
  -h, --help                Print this help and exit
  -v, --verbose             Verbose output (increases if given multiple times)
  --version                 Print version information and exit
  --long-version            Print long version information and exit
  --force                   Force file overwriting