For problems with missing queries in the tool selection boxes the most common reason is the tool only lists history items with data formats compatible with the tool. Some formats are subsets of others and Galaxy should also list those with compatible subformats as well. If the query is not showing up still and you believe it is in the correct format you can click on the pencil icon and manually change the format. This will not edit the file just change the metadata for the file. Some cases you will need to actually change the file format. For example, if the file is space delimited and a tabular file is required; then the "Convert delimiters to TAB" tool under "Text Manipulation" can be used to reformat the file.
Some of the most commonly used formats are very similar. Start with the basic tabular file. It has few requirements other than 1 or more columns of data separated by tabs. Next is intervals which are tabular but they have the added requirement that 3 of the columns must be the chromosome, start point, and end point. There is optionally a strand and header labelling the columns. Next is BED or GFF, which are also tabular and intervals, but with more restrictions. BED can vary between 3 and 12 columns, with each being precisely defined. Here the order of the columns also matters, and only the end columns can be skipped. Some groups of the columns have to be all there or all left off. GFF is similar in setup but with all 9 columns required and different definitions. See more detailed descriptions below.
A binary sequence file in 'ab1' format with a '.ab1' file extension. You must manually select this 'File Format' when uploading the file.
blastz pairwise alignment format. Each alignment block in an axt file contains three lines: a summary line and 2 sequence lines. Blocks are separated from one another by blank lines. The summary line contains chromosomal position and size information about the alignment. It consists of 9 required fields. Click here for more information about axt format.
A binary file compressed in the BGZF format with a '.bam' file extension. SAM format is the human readable text version of these files.
A zipped archive consisting of binary sequence files in either 'ab1' or 'scf' format. All files in this archive must have the same file extension which is one of '.ab1' or '.scf'. You must manually select this 'File Format' when uploading the file.
chr22 1000 5000 cloneA 960 + 1000 5000 0 2 567,488, 0,3512 chr22 2000 6000 cloneB 900 - 2000 6000 0 2 433,399, 0,3601
A sequence in FASTA format consists of a single-line description, followed by lines of sequence data. The first character of the description line is a greater-than (">") symbol in the first column. All lines should be shorter than 80 characters::
>sequence1 atgcgtttgcgtgc gtcggtttcgttgc >sequence2 tttcgtgcgtatag tggcgcggtga
FastqSolexa is the Illumina (Solexa) variant of the Fastq format, which stores sequences and quality scores in a single file
@seq1 GACAGCTTGGTTTTTAGTGAGTTGTTCCTTTCTTT +seq1 hhhhhhhhhhhhhhhhhhhhhhhhhhPW@hhhhhh @seq2 GCAATGACGGCAGCAATAAACTCAACAGGTGCTGG +seq2 hhhhhhhhhhhhhhYhhahhhhWhAhFhSIJGChO Or @seq1 GAATTGATCAGGACATAGGACAACTGTAGGCACCAT +seq1 40 40 40 40 35 40 40 40 25 40 40 26 40 9 33 11 40 35 17 40 40 33 40 7 9 15 3 22 15 30 11 17 9 4 9 4 @seq2 GAGTTCTCGTCGCCTGTAGGCACCATCAATCGTATG +seq2 40 15 40 17 6 36 40 40 40 25 40 9 35 33 40 14 14 18 15 17 19 28 31 4 24 18 27 14 15 18 2 8 12 8 11 9
Also known as the FBAT format, for use in the FBAT program. It consists of a pedigree file and an phenotype file.
This format is a html web page. Click the eye icon to view the dataset in your browser.
#CHROM START END STRAND NAME COMMENT chr1 10 100 + exon myExon chrX 1000 10050 - gene myGene
LAV is the primary output format for BLASTZ. The first line of a .lav file begins with #:lav..
This is the linkage pedigree format (separate map and ped files). These files together describe SNPs, the map file has the position and an identifier for the SNP and the pedigree file has the alleles. To upload this format into Galaxy do not use auto-detect for the file format, instead select lped. You will then be given two sections for uploading files, one for the pedigree file and one for the map file. For more information see linkage pedigree or map or ped.
TBA and multiz multiple alignment format. The first line of a .maf file begins with ##maf. This word is followed by white-space-separated "variable=value pairs". There should be no white space surrounding the "=". Click here for more about MAF format.
This is the binary version of the lped file format.
PSL format is for alignments, it is returned by BLAT. It does not include any sequence.
A binary sequence file in 'scf' format with a '.scf' file extension. You must manually select this 'File Format' when uploading the file. Click here for more information.
A binary file in 'Standard Flowgram Format' with a '.sff' file extension.
Text delimited into columns by something other than a tab.
Any data in tab delimited format (tabular)
A zipped archive consisting of flat text sequence files. All files in this archive must have the same file extension of '.txt'. You must manually select this 'File Format' when uploading the file.
The wiggle format is line-oriented. Wiggle data is preceded by a track definition line, which gives the type of wiggle. There are 3 different types, each with their uses. More information here.
Any text file