Course Materials for Bioinformatics I
Fall 2007
Resources:
2006 NAR Database Issue
2006 NAR Web Server Issue
2007 NAR Database Issue
2007 NAR Web Server Issue
Lecture materials:
- Introduction to Bioinformatics I
- Basic Internet resources for bioinformatics
- Automated searching of Internet resources
- A few possible topics for term projects. Project #1 need not
involve any programming. Projects #2 and #3 could consist entirely of reading
and summarizing a few papers on the topic.
Projects #1 and #2 could set the stage for
long-term algorithm/software development projects.
- Select human gene clusters to be sequenced in various primates,
using aCGH (array comparative genome hybridization) data.
- Paper
about the genome sequence of rhesus macaque
(see especially discussions of aCGH and the PRAME gene cluster).
- Paper
about aCGH in primates
- Paper
about comparative sequencing of a gene cluster in several primates.
- How are experimental data that indicate functional genomic intervals
related to interspecies conservation and motifs in the sequence?
- Paper
on the ENCODE project.
- Paper
giving genome-wide data on binding sites for the CTCF protein.
- Paper
relating functional regions to sequence conservation.
- Paper giving genome-wide computational
predictions of regulatory modules.
- Why is the platypus genome G+C-rich (> 45%) while the opossum
genome is G+C-poor (< 38%)?
- Picture of G+C distributions
in 10-kb windows for a few vertebrattes.
- Paper on the genome of
Monodelphis domestica (South american short-tailed opossum).
- Paper reporting a fourfold to eightfold
increase in CpG substitution rate about 90 million years ago.
- Two papers on biased gene conversion and G+C content:
#1,
#2.
- Evolution of interspersed repeats in the woolly mammoth genome.
- Paper about the human genome sequence, with
a very good discussion of interspersed repeats (see pp. 879-888).
- Paper about sequencing woolly mammoth.
- Paper
about endogenous retroviruses in woolly mammoth.
- Paper
about endogenous retroviruses in elephant.
- A programmer interface to the 28-way alignment at the UCSC Browser.
- Draft
of paper on the 28-way alignment (to appear in Genome Research)
- Potentially useful internet resources
- Known transcription factor binding sites
- Determine transcription factor binding sites from co-expressed genes (this can also be done at some of the websites listed above)
- Database of experimentally confirmed regulatory elements
- Website for class notes from the Jones/Pevzner book.
Follow links Powerpoint Slides -> Chapter 4 -> Brute Force Motif Searching.
- Same as previous lecture.
- Website for class notes from the Jones/Pevzner book.
Follow links Powerpoint Slides -> Chapter 5.
- Same as previous lecture.
- Weight matrices.
- Notes on probabilistic sequence models.
- Survey by Gary Stormo.
- Notes on sorting by reversals and weight matrices.
- More on probabilistic sequence models.
- Notes why we are messing with this log-odds stuff.
- Website for class notes from the Jones/Pevzner book.
Follow links Powerpoint Slides -> Chapter 11 (HMMs).
- More on HMMs
- See Figure 11.5 (p. 299) of Jones and Pevzner for a profile HMM.
- Genscan paper. See especially Fig. 3 (p. 86).
- Notes on Hidden Markov Models.
- Nov. 13
- Study guide for the "non-phylogenetic" part of the exam.
- Website for class notes from the Jones/Pevzner book.
Follow links Powerpoint Slides -> Chapter 8 -> Graphs and DNA Sequencing.
- In Jones/Pevzner read about:
- NP-complete problems, pages 49-51
- graphs, and the Eulerian and Hamiltonian Cycle Problems,
pages 248-258
- the Shortest Superstring Problem, pages 264-265.
- Sequencing by Hybridization, and two solutions, pages 268-275.
You might find it amusing to read the
story of how I got involved in bioinformatics.