Example 2:  Using Galaxy to look for SNPs differing between populations

For this example we will use publicly available SNP data from the 1000 Genomes March 2012 release. We will be looking for SNPs that differ between the populations. In general you will not know the SNP that you are looking for, but here we used a known SNP to demonstrate. We chose a SNP associated with the light skin color of Caucasians (Lamason et al. 2005), because this is a phenotype that is available for public datasets.

This example builds a single sequential history, but there are links to specific parts if you are interested in just one section. These illustrate the following skills.

Part 1:  Creating a gd_snp (Genome Diversity SNP) dataset from public data in pgSnp (Personal Genome SNP) format.

Part 2:  Finding SNPs that differ between populations and filtering for those that may be phenotype-associated.

Reference:  Lamason RL, Mohideen MA, Mest JR, Wong AC, Norton HL, Aros MC, Jurynec MJ, Mao X, Humphreville VR, Humbert JE, Sinha S, Moore JL, Jagadeeswaran P, Zhao W, Ning G, Makalowska I, McKeigue PM, O'donnell D, Kittles R, Parra EJ, Mangini NJ, Grunwald DJ, Shriver MD, Canfield VA, Cheng KC (December 2005).  SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans.  Science 310 (5755): 1782.6. doi:10.1126/science.1116238. PMID 16357253