BLAST parameters
BLAST is a complex program and takes a large number of parameters that influence the way BLAST performs the homology search and formats its output. Here I discuss the parameter for the command line version of BLAST. Please note that the BLAST parameters are complex and difficult to understand at first. Some are explained very poorly in the original BLAST documentation. Therefore, for beginners the best option is to use default values where possible. That said, you can significantly improve BLAST results by adjusting some parameters, especially for short oligonucleotide sequences.
You can always get a help text including a tabular listing of the available parameters from BLAST by starting it with the parameter –help, like so:
blastall –help
Note that parameters are immediately followed by their value – there is no equation sign or space between the parameter key and the value. So for example, the database would be specified by ‘-dfilename’, not by ‘-d=filename’ or ‘-d filename’.
-p[program]:
This specifies which BLAST sub program out of the family of BLAST programs to use. ‘blastn’ performs a nucleotide against nucleotide BLAST, i.e. the query sequence and the BLAST database are DNA sequences. ‘blastp’ performs a protein against protein search. ‘blastx’ is interesting: It will translate the specified DNA sequence query into a protein sequence, using all six possible translation frames, and search for the resulting protein sequences in a protein database.
-d[database]:
This is the path to the BLAST database. Note that BLAST databases usually consist of multiple files with the same name but different extensions. For instance, the UniVec database provided with a BLAST installation, consists of ‘UniVec.nhr’, ‘UniVec.nin’ and ‘UniVec.nsq’. The database must include the complete path up to the database name, but excluding the extension. So for UniVec, you would specify
-dUniVec
-i[file]:
The file that contains the query sequence. If you omit this, BLAST will accept a sequence entered manually on the command line.
-e[value]:
The Expect value cutoff. Results with an expect value above this threshold will not be returned by BLAST. This is a convenient way to limit the output to relevant hits. The default value is 10.0 and is a good starting point. Note that for short sequences, even very good matches usually already have a high expect value, so in these cases you need to increase the E value, for example to 100.0 or 1000.0.
-m[number]:
The output format for alignments. This is a number in the range 0 to 11. Each number stands for a specific output format. The default value is 0, which is pairwise alignments. You can list all available output formats by calling blastall with ‘–help’. An interesting option is ‘-m7′, which returns XML output.
-o[file]:
The output file. If this is not specified, BLAST will print the output onto the console.
-F[T/F]:
This can be either ‘true’ (‘-FT’) or ‘false’ (‘-FF’). If true, filtering is applied to the input sequences. This is an important option as it can significantly distort your results. Filtering means that sub sequences with a low statistical significance are masked and not used in the search. This can be very useful if you want to avoid irrelevant hits, but can be dangerous if you need to find all possible hits, for example to see the cross hybridization tendency of a PCR primer. For blastn, filtering is done using the DUST tool. For others, the SEG tool is used.
-G[number]:
The cost to open a gap. By default, BLAST can perform alignments with gaps, i.e. where one or more nucleotides in the query or subject sequence do not match any nucleotide on the other sequence. In order to calculate optimal alignments, BLAST needs to assign a cost to new gaps. Every time a gap is inserted in an alignment, the above specified cost is attributed to the gap. The default value is -1. If you want to discourage BLAST from opening gaps, increase this value.
-E[number]:
This is the cost of extending an existing gap. In many alignment models, extending gaps is ‘cheaper’ than opening them. The rationale behind this is that natural changes to genes often include the insertion or deletion of a string of nucleotides, and thus the length of a gap is not as relevant as the fact that it exists at all. The default value is -1, but if you increase the gap opening cost, you may consider changing the gap extension cost accordingly.
-X[number]:
According to the BLAST help, this is the ‘X dropoff value for gapped alignment (in bits)’. As of now, it is completely unclear to me what this means. Stay tuned for an update!
-I[T/F]:
If true, gene identifiers (GI’s) are shown in the definition lines of the BLAST output. The default is false.
-q[value]:
The penalty (cost) for a nucleotide mismatch in blastn. The default value is -3. This is related to -G and -E, the costs for opening and extending a gap. In an alignment, the relevance of mismatches is determined by this mismatch penalty. Depending on whether the cost for a mismatch or for an opened gap is higher, the alignment algorithm will prefer one or the other, which will affect the quality of the alignment. The default values usually work very well, but if you want to change them, you should experiment with different numbers for -q, -r, -G and -E.
-r[value]:
The reward for a nucleotide match in blastn. This corresponds to -q for the case of two matching nucleotides. The default value is 1. Note that the -q parameter should always be negative (it is a penalty) whereas the -r value should always be positive.
-v[number]:
The number of database sequences to show one-line descriptions for. The BLAST output may consist of a large number of sequences, and with this parameter you can limit the amount of output. However, be careful not to suppress valuable homology information in the output. I recommend to set this parameter to values lower than the default 500 only after you have looked at the full output and have determined that it is too long.
-b[number]:
Like -v, this limits the output, but this time the number of alignments to show. Alignments take up much screen space, and printing too many of them may be confusing. Therefore, you can limit the alignments to the top few hits. The default is 250. Since you will still see one-line descriptions for suppressed alignments, you can set this parameter to a very low number.
-f[number]:
According to the BLAST help, this is the ‘threshold for extending hits’. Wait for an update to see a more detailed description.
-g[T/F]:
If true, a gapped alignment is performed (the default), otherwise an ungapped one. Note that if you suspect that your query sequence does not match the database perfectly, an ungapped alignment may lead to too few or no hits, as it may be necessary to insert gaps to get a proper alignment.
-Q[number]:
The query genetic code to use. This is relevant for blastx only, where the search sequence is translated into a protein sequence. I am still investigating which number codes refer to which genetic code. The default is 1.
-D[number]:
Like -Q, this specifies a genetic code, but this time for the database. This is used by the tblastx program only.
-a[number]:
Number of processors to use. The default is 1. If you do not know the machine on which you use BLAST very well, and especially if other programs and services are running on it (or other users working on it in the case of a multi-user environment), you should leave this at 1. Otherwise, if you actually have more than one processor, you can increase this number. However, be careful that if you use all processors, the system as a whole may become somewhat sluggish or unresponsive during a BLAST search.
-O[file]:
An optional SeqAlign file that will be created.
-J[T/F]:
If true, BLAST believes the query defline, otherwise it does not. Still need to find out what that means.
-M[name]:
The name of the alignment matrix to use. The default is BLOSUM62. The alignment matrix contains a reward or penalty for each possible match or mismatch. For instance, for nucleotides, it contains a score for each combination of A, C, G, T and N. The default BLOSUM62 matrix is a very good one for the detection of moderate homologies.
-W[number]:
This is one of the most important parameters, the word size. The default is 11 for blastn, 28 for megablast and 3 for all others. In order to find matches quickly, BLAST looks for perfect matches across subsequences of the length of this word size. Therefore, if the word size is very small, the search takes longer. If the search size becomes large, imperfect matches (e.g. with a single nucleotide insertion) are no longer found. For short oligonucleotides, the default word size of 11 for blastn is a bit high – I recommend a smaller word size such as 5 or 7.
-z[number]:
Effective length of the database. The default value of 0 uses the real length. I don’t know where or why this is relevant, but it possibly affects the calculation of Expect values.
-K[number]:
Number of best hits from a region to keep. Still need to find out what exactly that means.
-P:
Still need to find out what exactly this parameter means.
-Y:
Effective length of the search space. The default value of 0 uses the real length. As for -z, I don’t know where this is used, but it probably affects the calculation of Expect values.
-S[number]:
The query strands to search in the database (for DNA inputs). 1 is top, 2 is bottom (reverse complementary) and 3 is both. The default is 3, i.e. both, which is the safest option.
-T[T/F]:
If true, the output will be HTML, not text. This is useful if you want to put the output onto a website, integrate it into a web application or view it in a browser (which may be easier to read than the text output).
-l[String]:
Restrict the database search to this list of gene identifiers (GI’s).
-U[T/F]:
Use lower-case filtering of FASTA sequences. If this is true, lower case characters in FASTA sequences are ignored in the search, and only upper case characters are searched. The default is false, i.e. all sequence characters are searched.
-y[number]:
According to BLAST help, ‘X dropoff value for ungapped extensions in bits’. As with -X, I don’t know what that’s supposed to mean.
-Z[number]:
According to BLAST help, ‘X dropoff value for final gapped alignment in bits’. Same as -y and -X, I don’t know what that’s supposed to mean.
-R[file]:
BLAST help says, ‘PSI-TBLASTN checkpoint file’.
-n[T/F]:
If true, a Megablast search is performed
You may also be interested in:


