Human Variation Tools in Galaxy
Galaxy is a software framework that provides web-based tools for
bioinformatics, including tasks useful in the analysis of human
variation. The developers maintain a public server at Penn State, and
the software is also freely available for local installation and
customization. Galaxy is highly extensible, so as new tools become
available (not necessarily written specifically for Galaxy) they can
be added to increase the power and flexibility of the system.
The core concept of Galaxy's working paradigm is the user
history, which is basically a list of your datasets at various
stages of analysis. Each history item includes not only the dataset
resulting from a particular computation, but also meta-information
about it, such as the file format, genome build, and (if applicable)
the tool and parameters used to obtain this dataset from earlier ones.
Thus each analysis consists of a chain (or branched network) of steps,
which are documented in your history for reproducibility. Moreover,
the sequence of tools and parameters used can be extracted and saved
as a workflow independent of the actual data, which can then
be rerun automatically on additional input datasets of the same type.
You can keep separate histories for each project, and even share
histories and workflows with other users.
This tutorial focuses on some of the tools available on the public
Galaxy server that are useful for analyzing human variation. We trace
step-by-step through an example illustrating several methods for
examining a single full-coverage genome to look for SNPs that are
known to be associated with disease. It makes use of public genomic
data, tools designed specifically for working with variants, and also
some general tools for text manipulation and operations on genomic
coordinates. For a more basic introduction to using Galaxy in
general, please see Unit [number], "[title]", or the documentation
available at galaxyproject.org.