Course Materials for Bioinformatics I
Fall 2007


Resources:

  • 2006 NAR Database Issue
  • 2006 NAR Web Server Issue
  • 2007 NAR Database Issue
  • 2007 NAR Web Server Issue

    Lecture materials:

    1. Introduction to Bioinformatics I
    2. Basic Internet resources for bioinformatics
    3. Automated searching of Internet resources
    4. A few possible topics for term projects. Project #1 need not involve any programming. Projects #2 and #3 could consist entirely of reading and summarizing a few papers on the topic. Projects #1 and #2 could set the stage for long-term algorithm/software development projects.
      1. Select human gene clusters to be sequenced in various primates, using aCGH (array comparative genome hybridization) data.
        • Paper about the genome sequence of rhesus macaque (see especially discussions of aCGH and the PRAME gene cluster).
        • Paper about aCGH in primates
        • Paper about comparative sequencing of a gene cluster in several primates.
      2. How are experimental data that indicate functional genomic intervals related to interspecies conservation and motifs in the sequence?
        • Paper on the ENCODE project.
        • Paper giving genome-wide data on binding sites for the CTCF protein.
        • Paper relating functional regions to sequence conservation.
        • Paper giving genome-wide computational predictions of regulatory modules.
      3. Why is the platypus genome G+C-rich (> 45%) while the opossum genome is G+C-poor (< 38%)?
        • Picture of G+C distributions in 10-kb windows for a few vertebrattes.
        • Paper on the genome of Monodelphis domestica (South american short-tailed opossum).
        • Paper reporting a fourfold to eightfold increase in CpG substitution rate about 90 million years ago.
        • Two papers on biased gene conversion and G+C content: #1, #2.
      4. Evolution of interspersed repeats in the woolly mammoth genome.
        • Paper about the human genome sequence, with a very good discussion of interspersed repeats (see pp. 879-888).
        • Paper about sequencing woolly mammoth.
        • Paper about endogenous retroviruses in woolly mammoth.
        • Paper about endogenous retroviruses in elephant.
      5. A programmer interface to the 28-way alignment at the UCSC Browser.
        • Draft of paper on the 28-way alignment (to appear in Genome Research)
    5. Potentially useful internet resources
    6. Website for class notes from the Jones/Pevzner book. Follow links Powerpoint Slides -> Chapter 4 -> Brute Force Motif Searching.
    7. Same as previous lecture.
    8. Website for class notes from the Jones/Pevzner book. Follow links Powerpoint Slides -> Chapter 5.
    9. Same as previous lecture.
    10. Weight matrices.
    11. Notes on sorting by reversals and weight matrices.
    12. More on probabilistic sequence models.
      • Notes why we are messing with this log-odds stuff.
      • Website for class notes from the Jones/Pevzner book. Follow links Powerpoint Slides -> Chapter 11 (HMMs).
    13. More on HMMs
      • See Figure 11.5 (p. 299) of Jones and Pevzner for a profile HMM.
      • Genscan paper. See especially Fig. 3 (p. 86).
      • Notes on Hidden Markov Models.
    14. Nov. 13
      • Study guide for the "non-phylogenetic" part of the exam.
      • Website for class notes from the Jones/Pevzner book. Follow links Powerpoint Slides -> Chapter 8 -> Graphs and DNA Sequencing.
      • In Jones/Pevzner read about:
        • NP-complete problems, pages 49-51
        • graphs, and the Eulerian and Hamiltonian Cycle Problems, pages 248-258
        • the Shortest Superstring Problem, pages 264-265.
        • Sequencing by Hybridization, and two solutions, pages 268-275.

    You might find it amusing to read the story of how I got involved in bioinformatics.