Phenotype Association Tools in Galaxy

Galaxy is a software framework that provides web-based tools for bioinformatics, including tasks useful in the analysis of human variation. The developers maintain a public server at Penn State, and the software is also freely available for local installation and customization. Galaxy is highly extensible, so as new tools become available (not necessarily written specifically for Galaxy) they can be added to increase the power and flexibility of the system.

This tutorial focuses on some of the tools available on the public Galaxy server that are useful for exploring possible associations between human genetic variants and phenotypes. It traces step-by-step through several examples. For a more general introduction to using Galaxy, please see the documentation available at galaxyproject.org.

Basics:  A brief orientation to the fundamentals of using Galaxy

Example 1:  Using Galaxy to look for disease SNPs in a full genome

This example illustrates several methods for examining a single full-coverage genome to look for single-nucleotide polymorphisms (SNPs) that are either known to be associated with disease, or suspected to have impact for other reasons. It makes use of public genomic data, tools designed specifically for working with variants, and also some general tools for text manipulation and operations on genomic coordinates.

Example 2:  Using Galaxy to look for SNPs differing between populations

This example illustrates methods for comparing two populations to look for fixed differences between them. It starts with publicly available SNP data and a known phenotype-associated SNP, and looks for the SNP in the same manner as you would in a case-control study. This example uses both general tools and some tools specifically intended for working with variants.

Example 3:  Using Galaxy to look for disease-associated SNPs in a pedigree

This example illustrates methods for looking for disease-associated SNPs in full-coverage genomes in a family. It uses the CEPH pedigree genomes provided by Complete Genomics, plus planted disease SNPs from the CFMDB database.

Example 4:  Using Galaxy to look for population structure and selective sweeps

This example illustrates using low-coverage sequence data with tools for examining population structure and detecting selective sweeps. This is an intermediate-level example, and assumes you already have some of the basic Galaxy skills, such as importing datasets, that are covered in the earlier examples.

Conventions used in this tutorial:

Red arrows or boxes on the screenshots indicate settings or things you will need to do. Green arrows are the "go" buttons once the settings or parameters are selected. Blue arrows or boxes point out additional information that you should note, but they don't require any action.


Funding for our work on assembling and documenting the Phenotype Association tools was provided by NIH grant UL1 RR033184-01 to the Penn State Clinical and Translational Science Institute. This project is funded, in part, under a grant with the Pennsylvania Department of Health using Tobacco CURE Funds. The Department specifically disclaims responsibility for any analyses, interpretations, or conclusions.