LASTZ

TABLE OF CONTENTS

Introduction

This document describes installation and usage of the LASTZ sequence alignment program. LASTZ is a drop-in replacement for BLASTZ. It is backward compatible with BLASTZ's command-line options, while adding additional options.

LASTZ -- Tool for (1) Pairwise DNA sequence alignment and (2) alignment scores inference.

Platform:This package was developed on a Macintosh OSX system, but should work on other Linux or Unix platforms with little change (if any). LASTZ was written in C and compiled with gcc. Some ancillary tools were written in Python, but only use modules available in typical python installations.
Author:Bob Harris,  <rsharris at bx dot psu dot edu>
Date:July 23, 2008

This is a preliminary document, covering installation, common options, and support for yasra. A more detailed document decribing additional features is forthcoming.

Installation

If you have received the distribution as a packed archive, unpack the archive by whatever means are appropriate for your computer. The result should be a directory <somepath>/lastz-distrib-X.XX.XX that contains a src subdirectories (and some others). You may find it convenient to remove the revision number (-X.XX.XX) from the directory name. Before building or installing any of the programs, you will need to do one of two things. Either create the shell variable

    $LASTZ = <somepath>/lastz-distrib-X.XX.XX
and add <somepath>/lastz-distrib-X.XX.XX to your $PATH, or edit
    <somepath>/lastz-distrib-X.XX.XX/make-include.mak
and change the definition of installDir to some directory already in your path.

Then to build the LASTZ executable, from bash (or a similar command line shell), do the commands below. This will build two executables (lastz and lastz_D) and copy them into your installDir.

    cd <somepath>/lastz-distrib-X.XX.XX/src
    make
    make install
The two executables are the same program. lastz uses integer scores, while lastz_D uses floating-point scores.

A simple self test is included so you can test that the build succeeded. To run it, do this command:

    make test
If the test is successful, you will see no output from this command. Otherwise, you will see the differences between the expected output and the output of your build, plus a line that looks like this:
    make: *** [test] Error 1

Examples

Aligning a human chromosome to a chicken chromosome

To run a quick low-sensititivy alignment of these sequences:

    lastz hg18.chr4 galGal3.chr4 C=3 T=2 Z=10 --maf > hg18_4.galGal3_4.maf

Comparing shotgun reads to a human chromosome

    lastz hg18.chr22 reads --yasra98 --maf > hg18_22.reads.maf

Command-Line Options

If you are familiar with BLASTZ, you can run LASTZ the same as you ran BLASTZ, with the same options and input files. In addition to BLASTZ compatibility, LASTZ provides other options.

The general format of the LASTZ command line is

    lastz target_file query_file options
Command-line elements can appear in any order, the only constraint being that, if present, the query_file must appear after the target_file. The target_file and query_file are usually just the names of files containing the two sequences to be aligned, either in FASTA, nib or 2bit format. However, they can also specify subsequences; running
    lastz ‑‑help=files
gives a description of file-related options.

The general format for options is ‑‑<name> or ‑‑<name>=<value>. For BLASTZ compatibility some options can be set with <letter>=<number>.

Running the command lastz without specifiers or options gives a list of the most commonly used options. Running

    lastz ‑‑help
gives a list of all the options.

Commonly-Used Options

optionBLASTZ equivlaentmeaning
‑‑both[strands] B=2 Search both strands.
‑‑plus[strand] B=0 Search forward strand only (strand matching the query file).
‑‑minus[strand] B=‑1 Search reverse complement strand only (opposite strand of query file).
(by default both strands are searched)
‑‑seed=12of19 T=1 or T=2 Seed hits require matches in 12 specific positions of 19 bp word.
‑‑seed=14of22 T=3 or T=4 Seed hits require matches in 14 specific positions of 22 bp word.
‑‑seed=match(<n>) W=<n> Seed hits require a length-n match.
‑‑transition T=1 or T=3 Allow one transition in a seed hit.
‑‑transition=2 Allow two transitions in a seed hit.
‑‑notransition T=2 or T=4 Don't allow transitions in a seed hit.
(by default the 12-of-19 seed is used, and one transition is allowed)
‑‑step=<n> Z=<n> Number of bases between start of each target word considered for seed matches.
(by default, a step of 1 is used)
‑‑gfextend Perform gap-free extension of seed hits to HSPs.
‑‑nogfextend Don't perform gap-free extension of seed hits to HSPs.
‑‑chain C=1 or C=2 Perform chaining of HSPs.
‑‑nochain C=0 or C=3 Don't perform chaining of HSPs.
‑‑gapped C=0 or C=2 Perform gapped alignment (instead of gap-free).
‑‑nogapped C=1 or C=3 Don't perform gapped alignment.
(by default gapped alignment is performed, without chaining)
‑‑scores=<file> Q=<file> Read substitution scores from a file.
‑‑match=<reward>,<penalty> Scores are +<reward>/‑<penalty> for match/mismatch.
(by default, HOXD70 scores are used)
    
ACGT
A91-114-31-123
C-114100-125-31
G-31-125100-114
T-123-31-11491
‑‑gap=<[open,]extend> O=<score> E=<score> Set gap open and extend penalties.
(default is 400 for gap open, 30 for gap extend)
‑‑xdrop=<score> X=<score> Set x-drop threshold.
(default is 10 times the A‑vs‑A substitution score)
‑‑ydrop=<score> Y=<score> Set y-drop threshold.
(default is the score of a 300 base gap)
‑‑hspthresh=<score> K=<score> Set threshold for high scoring pairs; ungapped extensions scoring lower are discarded.
(default is 3000)
‑‑gappedthresh=<score> L=<score> Set threshold for gapped alignments; gapped extensions scoring lower are discarded.
(default is to use same value as ‑‑hspthresh)
‑‑inner=<score> H=<score> Set threshold for HSPs during interpolation.
(default is to not perform interpolation)
‑‑[no]entropy P=1 Involve entropy in filtering high scoring pairs.
‑‑noentropy P=0 Don't involve entropy in filtering high scoring pairs.
(default is to involve entropy)
‑‑traceback=<bytes> m=<bytes> Space for trace-back information.
(default is 80.0M)
‑‑identity=<min>[..<max>] Filter alignments by percent identity, 0 ≤ min ≤ max ≤ 100; blocks (or HSPs) outside the range are discarded.
(default is no identity filtering)
‑‑format=<type> Specify output format; one of lav, axt, maf or text.
(by default output format is LAV)
‑‑help List all options.
‑‑help=files List information about file specifers.
‑‑help=shortcuts List BLASTZ-compatible shortcuts.

Yasra-Specific Options

There are several options to support the yasra mapping assembler. These provide canned sets of option settings that work well for aligning an assembled reference sequence (as the target) with a set of shortgun reads (as the query). The option relate to the expected level of identity between the sequences. For example, ‑‑yasra90 should be used when we expect 90% identity. The ‑‑yasraXXshort options are appropriate when the reads are very short (less than 50 bp).

optionequivalent
‑‑yasra98 T=2 Z=20 ‑‑match=1,6 O=8 E=1 Y=20 K=22 L=30 ‑‑identity=98
‑‑yasra95 T=2 Z=20 ‑‑match=1,5 O=8 E=1 Y=20 K=22 L=30 ‑‑identity=95
‑‑yasra90 T=2 Z=20 ‑‑match=1,5 O=6 E=1 Y=20 K=22 L=30 ‑‑identity=90
‑‑yasra85 T=2 ‑‑match=1,2 O=4 E=1 Y=20 K=22 L=30 ‑‑identity=85
‑‑yasra75 T=2 ‑‑match=1,1 O=3 E=1 Y=20 K=22 L=30 ‑‑identity=75
‑‑yasra95short T=2 ‑‑match=1,7 O=6 E=1 Y=14 K=10 L=14 ‑‑identity=95
‑‑yasra85short T=2 ‑‑match=1,3 O=4 E=1 Y=14 K=11 L=14 ‑‑identity=85