Homework Assignment #2


Look for genomic regions that might be regulatory sites, using Galaxy and the UCSC Browser. Start at Galaxy.

Step 1: Upload a custom track of intervals that were predicted by Blanchette et al. (2006) to be regulatory regions, noting that the coordinates in the human genome are relative to an outdated assembly, called hg17. To do this, perform the following operations. Click on "Get Data" (upper left), then on "Upload File". Then paste this URL into the window:

http://www.bx.psu.edu/~ross/share/PReMod_hg17.bed.txt

To tell Galaxy that this comes from build hg17 (not the current assembly), under "Genome" select hg17. Finally, click "Execute".

Step 2: Convert the annotation to the current human assembly by selecting "Lift-Over" and convert the coordinates to hg18.

Step 3: Upload annotation of all highly conserved regions in hg18 by the following choices. Get Data -> UCSC Main table browser and select group: Comparative Genomics, track:28-Way Most Cons. select region: genome, send output to Galaxy, get output, Send query to Galaxy. (It is a good idea to look at every set of data that you give Galaxy.)

Step 4: Find which predicted regulatory regions are highly conserved among mammals by: Operate on Genomic Intervals -> Intersect the intervals of two queries. Make sure you are asking for the instervals from Step 2 that intersect an interval from Step 3, not the converse.

Step 5: View the interval in the UCSC Browser by clicking on the result of the intersection operation in the history panel (far right) and clicking on Display at UCSC main.

Write (in a plain text file) a short paragraph describing the number and location of these conserved putative regulatory regions around the alpha-globin gene cluster, say chr16:1-300000. Send your report by email to Qingyu Wang (qzw102@psu.edu) by noon on Thursday, September 2. I'll be available Wednesday to answer questions, at webb@bx.psu.edu.