Example 3:  Using Galaxy to look for disease SNPs in a pedigree

For this example we will use an artificial set of disease SNPs from the CFMDB database. The SNPs are real but wouldn't necessarily all be in one family. Three SNPs were chosen to cover looking for SNPs in different parts of the gene. One is coding, one a splice site, and another the promoter. Five genomes from the Complete Genomics CEPH pedigree are used to plant the disease associated SNPs in. This gives us a realistic background for the SNPs as well as realistic results when using the genomes and the pedigree information to filter the SNPs. These disease SNPs were chosen because cystic fibrosis is a good example of recessive inherited disease and is found in CEU.

Disease SNPs planted in the sample dataset

chr7 117119336 117119337 G promoter
chr7 117144340 117144341 T exon
chr7 117174423 117174424 A splicing

Genomes used for pedigree

NA12877 father
NA12878 mother
NA12879 daughter
NA12880 daughter
NA12882 son

This example builds a single sequential history, but there are links to specific parts if you are interested in just one section. The later parts do not go in as much detail if a similar step was done in earlier ones, so if you are very unfamiliar with Galaxy it is best to go through the full example.

Part 1:  Preparing input data.

Part 2:  Using the pedigree and recessive inheritance to filter SNPs.

Part 3:  Removing SNPs found in healthy controls.

Part 4:  Finding SNPs that are likely to be phenotype associated.

Part 5:  Using known gene-disease associations.