TABLE OF CONTENTS
LASTZ -- Tool for (1) Pairwise DNA sequence alignment and (2) alignment scores inference.
Platform: | This package was developed on a Macintosh OSX system, but should work on other Linux or Unix platforms with little change (if any). LASTZ was written in C and compiled with gcc. Some ancillary tools were written in Python, but only use modules available in typical python installations. |
Author: | Bob Harris, <rsharris at bx dot psu dot edu> |
Date: | July 23, 2008 |
This is a preliminary document, covering installation, common options, and support for yasra. A more detailed document decribing additional features is forthcoming.
If you have received the distribution as a packed archive, unpack the archive
by whatever means are appropriate for your computer. The result should be a
directory <somepath>/lastz-distrib-X.XX.XX
that
contains a src
subdirectories (and some others). You may find it
convenient to remove the revision number (-X.XX.XX
) from the
directory name.
Before building or installing any of the programs, you will need to do one
of two things. Either create the shell variable
$LASTZ = <somepath>/lastz-distrib-X.XX.XX
and add <somepath>/lastz-distrib-X.XX.XX
to your
$PATH
, or edit
<somepath>/lastz-distrib-X.XX.XX/make-include.mak
and change the definition of installDir
to some directory already
in your path.
Then to build the LASTZ executable, from bash (or a similar command line
shell), do the commands below. This will build two executables (lastz and
lastz_D) and copy them into your installDir
.
cd <somepath>/lastz-distrib-X.XX.XX/src make make installThe two executables are the same program. lastz uses integer scores, while lastz_D uses floating-point scores.
A simple self test is included so you can test that the build succeeded. To run it, do this command:
make testIf the test is successful, you will see no output from this command. Otherwise, you will see the differences between the expected output and the output of your build, plus a line that looks like this:
make: *** [test] Error 1
Aligning a human chromosome to a chicken chromosome
To run a quick low-sensititivy alignment of these sequences:
lastz hg18.chr4 galGal3.chr4 C=3 T=2 Z=10 --maf > hg18_4.galGal3_4.maf
Comparing shotgun reads to a human chromosome
lastz hg18.chr22 reads --yasra98 --maf > hg18_22.reads.maf
If you are familiar with BLASTZ, you can run LASTZ the same as you ran BLASTZ, with the same options and input files. In addition to BLASTZ compatibility, LASTZ provides other options.
The general format of the LASTZ command line is
lastz target_file query_file optionsCommand-line elements can appear in any order, the only constraint being that, if present, the
query_file
must appear after the
target_file
. The target_file
and
query_file
are usually just the
names of files containing the two sequences to be aligned, either in FASTA, nib
or 2bit format. However, they can also specify subsequences; running
lastz ‑‑help=filesgives a description of file-related options.
The general format for options is ‑‑<name>
or
‑‑<name>=<value>
.
For BLASTZ compatibility some options can be set with
<letter>=<number>
.
Running the command lastz
without specifiers or options gives a list
of the most commonly used options. Running
lastz ‑‑helpgives a list of all the options.
option | BLASTZ equivlaent | meaning | |||||||||||||||||||||||||||
‑‑both[strands] |
B=2 | Search both strands. | |||||||||||||||||||||||||||
‑‑plus[strand] |
B=0 | Search forward strand only (strand matching the query file). | |||||||||||||||||||||||||||
‑‑minus[strand] |
B=‑1 | Search reverse complement strand only (opposite strand of query file). | |||||||||||||||||||||||||||
(by default both strands are searched) | |||||||||||||||||||||||||||||
‑‑seed=12of19 |
T=1 or T=2 | Seed hits require matches in 12 specific positions of 19 bp word. | |||||||||||||||||||||||||||
‑‑seed=14of22 |
T=3 or T=4 | Seed hits require matches in 14 specific positions of 22 bp word. | |||||||||||||||||||||||||||
‑‑seed=match(<n>) |
W=<n> | Seed hits require a length-n match. | |||||||||||||||||||||||||||
‑‑transition |
T=1 or T=3 | Allow one transition in a seed hit. | |||||||||||||||||||||||||||
‑‑transition=2 |
Allow two transitions in a seed hit. | ||||||||||||||||||||||||||||
‑‑notransition |
T=2 or T=4 | Don't allow transitions in a seed hit. | |||||||||||||||||||||||||||
(by default the 12-of-19 seed is used, and one transition is allowed) | |||||||||||||||||||||||||||||
‑‑step=<n> |
Z=<n> | Number of bases between start of each target word considered for seed matches. | |||||||||||||||||||||||||||
(by default, a step of 1 is used) | |||||||||||||||||||||||||||||
‑‑gfextend |
Perform gap-free extension of seed hits to HSPs. | ||||||||||||||||||||||||||||
‑‑nogfextend |
Don't perform gap-free extension of seed hits to HSPs. | ||||||||||||||||||||||||||||
‑‑chain |
C=1 or C=2 | Perform chaining of HSPs. | |||||||||||||||||||||||||||
‑‑nochain |
C=0 or C=3 | Don't perform chaining of HSPs. | |||||||||||||||||||||||||||
‑‑gapped |
C=0 or C=2 | Perform gapped alignment (instead of gap-free). | |||||||||||||||||||||||||||
‑‑nogapped |
C=1 or C=3 | Don't perform gapped alignment. | |||||||||||||||||||||||||||
(by default gapped alignment is performed, without chaining) | |||||||||||||||||||||||||||||
‑‑scores=<file> |
Q=<file> | Read substitution scores from a file. | |||||||||||||||||||||||||||
‑‑match=<reward>,<penalty> |
Scores are +<reward>/‑<penalty> for match/mismatch. | ||||||||||||||||||||||||||||
(by default, HOXD70 scores are used)
| |||||||||||||||||||||||||||||
‑‑gap=<[open,]extend> |
O=<score> E=<score> | Set gap open and extend penalties. | |||||||||||||||||||||||||||
(default is 400 for gap open, 30 for gap extend) | |||||||||||||||||||||||||||||
‑‑xdrop=<score> |
X=<score> | Set x-drop threshold. | |||||||||||||||||||||||||||
(default is 10 times the A‑vs‑A substitution score) | |||||||||||||||||||||||||||||
‑‑ydrop=<score> |
Y=<score> | Set y-drop threshold. | |||||||||||||||||||||||||||
(default is the score of a 300 base gap) | |||||||||||||||||||||||||||||
‑‑hspthresh=<score> |
K=<score> | Set threshold for high scoring pairs; ungapped extensions scoring lower are discarded. | |||||||||||||||||||||||||||
(default is 3000) | |||||||||||||||||||||||||||||
‑‑gappedthresh=<score> |
L=<score> | Set threshold for gapped alignments; gapped extensions scoring lower are discarded. | |||||||||||||||||||||||||||
(default is to use same value as ‑‑hspthresh) | |||||||||||||||||||||||||||||
‑‑inner=<score> |
H=<score> | Set threshold for HSPs during interpolation. | |||||||||||||||||||||||||||
(default is to not perform interpolation) | |||||||||||||||||||||||||||||
‑‑[no]entropy |
P=1 | Involve entropy in filtering high scoring pairs. | |||||||||||||||||||||||||||
‑‑noentropy |
P=0 | Don't involve entropy in filtering high scoring pairs. | |||||||||||||||||||||||||||
(default is to involve entropy) | |||||||||||||||||||||||||||||
‑‑traceback=<bytes> |
m=<bytes> | Space for trace-back information. | |||||||||||||||||||||||||||
(default is 80.0M) | |||||||||||||||||||||||||||||
‑‑identity=<min>[..<max>] |
Filter alignments by percent identity, 0 ≤ min ≤ max ≤ 100; blocks (or HSPs) outside the range are discarded. | ||||||||||||||||||||||||||||
(default is no identity filtering) | |||||||||||||||||||||||||||||
‑‑format=<type> |
Specify output format; one of lav, axt, maf or text. | ||||||||||||||||||||||||||||
(by default output format is LAV) | |||||||||||||||||||||||||||||
‑‑help |
List all options. | ||||||||||||||||||||||||||||
‑‑help=files |
List information about file specifers. | ||||||||||||||||||||||||||||
‑‑help=shortcuts |
List BLASTZ-compatible shortcuts. |
There are several options to support the yasra mapping assembler. These
provide canned sets of option settings that work well for aligning an assembled
reference sequence (as the target) with a set of shortgun reads (as the query).
The option relate to the expected level of identity between the sequences. For
example, ‑‑yasra90
should be used when we expect 90%
identity. The ‑‑yasraXXshort
options are appropriate
when the reads are very short (less than 50 bp).
option | equivalent |
‑‑yasra98 |
T=2 Z=20 ‑‑match=1,6 O=8 E=1 Y=20 K=22 L=30 ‑‑identity=98 |
‑‑yasra95 |
T=2 Z=20 ‑‑match=1,5 O=8 E=1 Y=20 K=22 L=30 ‑‑identity=95 |
‑‑yasra90 |
T=2 Z=20 ‑‑match=1,5 O=6 E=1 Y=20 K=22 L=30 ‑‑identity=90 |
‑‑yasra85 |
T=2 ‑‑match=1,2 O=4 E=1 Y=20 K=22 L=30 ‑‑identity=85 |
‑‑yasra75 |
T=2 ‑‑match=1,1 O=3 E=1 Y=20 K=22 L=30 ‑‑identity=75 |
‑‑yasra95short |
T=2 ‑‑match=1,7 O=6 E=1 Y=14 K=10 L=14 ‑‑identity=95 |
‑‑yasra85short |
T=2 ‑‑match=1,3 O=4 E=1 Y=14 K=11 L=14 ‑‑identity=85 |