Test Version Galaxy:  Portal | History | About Galaxy | Example queries | Help | FAQ | Contact us

Help: File Formats

TABLE OF CONTENTS

[This page does not yet have documentation for all of the formats, but what is here should be correct.]

UCSC Standard BED Format (.bed)

Browser Extensible Data format was designed at UCSC for displaying data tracks in the Genome Browser. When used by Galaxy, this format is tab-separated. It has three required fields and 12 additional optional ones. Files in this format must have the file extension '.bed'. More information is available in UCSC's document on custom tracks.

The first three BED fields (required) are:

  1. chrom - The name of the chromosome (e.g. chr1, chrY_random).
  2. chromStart - The starting position in the chromosome. (The first base in a chromosome is numbered 0.)
  3. chromEnd - The ending position in the chromosome, plus 1 (i.e., a half-open interval).

The 12 additional BED fields (optional) are:

  1. name - The name of the BED line.
  2. score - A score between 0 and 1000.
  3. strand - Defines the strand - either '+' or '-'.
  4. thickStart - The starting position where the feature is drawn thickly at the Genome Browser.
  5. thickEnd - The ending position where the feature is drawn thickly at the Genome Browser.
  6. reserved - This should always be set to zero.
  7. blockCount - The number of blocks (exons) in the BED line.
  8. blockSizes - A comma-separated list of the block sizes. The number of items in this list should correspond to blockCount.
  9. blockStarts - A comma-separated list of block starts. All of the blockStart positions should be calculated relative to chromStart. The number of items in this list should correspond to blockCount.
  10. expCount - The number of experiments.
  11. expIds - A comma-separated list of experiment ids. The number of items in this list should correspond to expCount.
  12. expScores - A comma-separated list of experiment scores. All of the expScores should be relative to expIds. The number of items in this list should correspond to expCount.

In order to use a field, all fields before it must be filled. The value used to indicate that a field is empty varies, as follows:

  1. The first three fields (chrom, chromStart, chromEnd) are required and must not be empty.
  2. For name, strand, expIds, and expScores: use a period '.'.
  3. For score, reserved, blockStarts, and expCount: use '0'.
  4. For blockCount: use '1'.
  5. For blockSizes: use chromEnd - chromStart.
  6. For thickStart: use chromStart.
  7. For thickEnd: use chromEnd.
Note that the value "NaN" (not-a-number) used by some databases is not supported; use the above values instead.

Example

Here's an example of two BED format lines:

chrom chromStart chromEnd name score strand thickStart thickEnd reserved blockCount blockSizes blockStarts
chr3 214671 265280 Hs.517745 300 + 214671 265280 0 3 104,80,2030, 0,46624,48579,
chrX 156881 157496 Hs.530320 300 + 156881 157496 0 2 231,384, 0,231,

Galaxy Extended BED Format (.xbed)

Extended BED format is also tab-separated. The first 15 fields are the same as UCSC standard BED format, and it has three additional fields to accommodate multiple/flexible scores. Files in this format must have the file extension '.xbed'.

The three additional fields are:

  1. scoreCount - The number of scores in the extended BED line.
  2. scores - A comma-separated list of the scores. The number of items in this list should correspond to scoreCount.
  3. scoreNames - A comma-separated list of the score names. All of the names should be relative to scores. The number of items in this list should correspond to scoreCount.

As with standard BED format, all fields preceding the ones you want to use must be filled. The values used to indicate empty fields are the same as listed for UCSC standard BED format. Again, the value "NaN" (not-a-number) used by some databases is not supported.

Example

Here's an example of a complete line in extended BED format.

chrom chromStart chromEnd name score strand thickStart thickEnd reserved blockCount blockSizes blockStarts expCount expIds expScores scoreCount scores scoreNames
chr15 93312259 93312615 Hs.269535 300 + 93312259 93312615 0 1 356, 0, 0 . . 2 80,20, name1,name2,

Galaxy tab-separated format (.gtab)

This is similar to ordinary tab-separated format, with the additional restriction that the first three fields must be chrom, chromStart, and chromEnd (as in BED format). Thus this format is intermediate in flexibility, providing Galaxy with the main fields it needs to perform operations and some other analyses, without being as restrictive as BED with regard to the other fields.

AXT alignment format (.axt)

This is a format for storing pairwise genomic sequence alignments. For more information, please see UCSC's document axt Alignment Format.

Stitched Alignments (.stitch)

The Stitch tool merges all alignment sets within an alignment (.axt, .stitch) into one alignment. Can have multiple alignment sets within a stitch file. Uses Genome Browser coordinates.

.stitch file format:

Example:

0	2	 hg17,mm5	Version=1
2	chr7,+,27055889,27056316,427,248330	chr7,+,27057906,27058163,257,248330
ATGGAGAGCCGAAAGGACATGGTTGTGTTTCTGGATGGGGGTCAGCTTGGCACTCTGGTTGGCAAGAGAGTCTCAAATTTGTCCGAAGCCGTGGGCAGCCCGCTGCCGGAGCCGCCCGAGAAAATG ...
2	chr6,+,52261447,52261874,427,248330	chr6,+,52263340,52263597,257,248330
ATGGAGAGCCGAAAGGACATGGTTATGTTTCTGGATGGGGGTCAGCTTGGCACTCTGGTTGGTAAGAGGGTCTCTAATTTGTCCGAAGCCGTGAGCAGCCCGCTGCCTGAACCGCCAGAGAAGATG ...

1	2	  hg17,mm5	Version=1
1	chr7,+,27058744,27059284,567,248330
GTGTGGTTCCAGAACCGGCGCATGAAGGACAAGCGGCAGCGCCTGGCCATGACGTGGCCGCACCCGGCGGACCCCGCCTTCTACACTTACATGATGAGCCATGCGGCGGCCGCGGGCGGCCTGCCC ...
1	chr6,+,52264137,52264704,567,248330
GTGTGGTTTCAGAACCGGCGCATGAAGGACAAGCGTCAGCGGCTGGCCATGACGTGGCCGCACCCGGCCGACCCTGCCTTCTACACCTACATGATGAGCCACGCGGCGGCCGCGGGCGGCCTGCCC ...