Table of Contents



Round 8 Segmentation-vs-GencodeV7 vs Same-Cell-Line RNASeq Contigs Plots

Plots

A comparison of round 8 segmentation classes vs same-cell-line RNASeq contigs.

Comparison plots (segmentation classes vs. RNASeq contigs). Each bar plot shows the log2 enrichment (or depletion) of RNASeq contigs that intersect each class.

View the bar plots here: Bar Plots

Round 8 Segmentation-vs-GencodeV7 Plots

Plots

A comparison of round 8 Segway and ChromHmm classes vs annotated regions (features).

Heat maps (segmentation classes vs. features). Each heatmap column shows the log2 enrichment (or depletion) of that column's feature in each class. “Occupied” is calculated base-per-base (the number of overlapping bases).

View the heatmaps here: Heat maps

Round 8 Segmentation-vs-GencodeV7 TSS

Plots

A comparison of round 8 Segway and ChromHmm classes vs GencodeV7 TSS (confidence=not_low).

Comparison plots (segmentation classes vs. TSS). Each plot shows two bar plots— the percentage of each class occupied by TSS sites (counted as one base per site), and the log2 enrichment (or depletion) of TSS in each class. “Occupied” is calculated base-per-base (the number of overlapping bases).

View the bar plots here: Bar Plots

Round 8 Segmentation-vs-vs Same-Cell-Line RNA Plots

Plots

A comparison of round 8 ChromHmm classes vs same-cell-line RNA and TSS.

The plots show the number of TSS in same-cell-line RNA. One plot shows the proportion of the RNA that is a TSS, another plot shows the number of TSS, not normalized.

View the plots here: Plots

Round 8 Segmentation Plots

Plots

A comparison of round 8 Segway and ChromHmm classes vs annotated regions (features).

Segmentations (columns in the table below) are referred to by the following abbreviations:
 S K562 ALL  = segway k562.all
 S TIER1-2  = segway tier1-2.coordinated
 S K562  = segway k562.coordinated
 S GM12878  = segway gm12878.coordinated
 S H1  = segway h1hesc.coordinated
 S HUVEC  = segway huvec.coordinated
 S HELA  = segway helas3.coordinated
 S HEPG2  = segway hepg2.coordinated
 C.HUVEC  = chromHmm HUVEC_concatenate_25
 C HELA  = chromHmm HELA_concatenate_25
 C H1  = chromHmm H1_concatenate_25
 C K562  = chromHmm K562_concatenate_25
 C GM12878  = chromHmm GM12878_concatenate_25
 C HEPG2  = chromHmm HEPG2_concatenate_25

Heat maps (segmentation classes vs. features). These are the same information as the “Comparison plots” below, but with each feature reduced to one column in a heatmap. Each heatmap shows the log2 enrichment (or depletion) of every feature in each class. “Occupied” is calculated base-per-base (the number of overlapping bases).

View them all together here: Heat maps

Or view them separately here:
S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C HUVEC C HELA C H1 C K562 C GM12878 C HEPG2

Value plots (some value computed over segmentation classes). Each plot shows a bar plot of the statistic computed over all the bases in a class.
GC content S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C HUVEC C HELA C H1 C K562 C GM12878 C HEPG2
CpG islands S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C HUVEC C HELA C H1 C K562 C GM12878 C HEPG2
CpG observed/expected S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C HUVEC C HELA C H1 C K562 C GM12878 C HEPG2

Comparison plots (segmentation classes vs. features). Each plot shows two bar plots— the percentage of each class occupied by the feature, and the log2 enrichment (or depletion) of the feature in each class. “Occupied” is calculated base-per-base (the number of overlapping bases).
repeats (repeat masker) S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C.HUVEC C HELA C H1 C K562 C GM12878 C HEPG2
CpG Islands S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C.HUVEC C HELA C H1 C K562 C GM12878 C HEPG2
all annotated biotypes (gencode_v3c) S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C.HUVEC C HELA C H1 C K562 C GM12878 C HEPG2
no annotated biotype (gencode_v3c) S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C.HUVEC C HELA C H1 C K562 C GM12878 C HEPG2
miRNA (gencode_v3c) S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C.HUVEC C HELA C H1 C K562 C GM12878 C HEPG2
misc RNA (gencode_v3c) S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C.HUVEC C HELA C H1 C K562 C GM12878 C HEPG2
processed transcript (gencode_v3c) S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C.HUVEC C HELA C H1 C K562 C GM12878 C HEPG2
protein coding (gencode_v3c) S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C.HUVEC C HELA C H1 C K562 C GM12878 C HEPG2
pseudogene (gencode_v3c) S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C.HUVEC C HELA C H1 C K562 C GM12878 C HEPG2
rRNA (gencode_v3c) S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C.HUVEC C HELA C H1 C K562 C GM12878 C HEPG2
scRNA pseudogene (gencode_v3c) S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C.HUVEC C HELA C H1 C K562 C GM12878 C HEPG2
snRNA (gencode_v3c) S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C.HUVEC C HELA C H1 C K562 C GM12878 C HEPG2
snRNA pseudogene (gencode_v3c) S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C.HUVEC C HELA C H1 C K562 C GM12878 C HEPG2
snoRNA (gencode_v3c) S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C.HUVEC C HELA C H1 C K562 C GM12878 C HEPG2
snoRNA pseudogene (gencode_v3c) S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C.HUVEC C HELA C H1 C K562 C GM12878 C HEPG2
non coding (gencode_v3c) S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C.HUVEC C HELA C H1 C K562 C GM12878 C HEPG2
retained intron (gencode_v3c) S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C.HUVEC C HELA C H1 C K562 C GM12878 C HEPG2
missing assembly (Ns) S K562 ALL S TIER1-2 S K562 S GM12878 S H1 S HUVEC S HELA S HEPG2 C.HUVEC C HELA C H1 C K562 C GM12878 C HEPG2


Depth of Conservation

Plots

Multiz 46-way Alignable Proportion      (safari pdf)
Multiz 46-way Alignable Proportion, “Normalized”      (safari pdf)
Above plots, linked for navigation  

Definitions

For a set of intervals (e.g. a Segway class), a reference species (human), and an alignment to some other species, the Alignable Proportion is the fraction of intervals that are at least 50% aligned. The “normalized” version of the plots normalizes by dividing by the same measure restricted to RefSeq coding exons.

Curve fits are only through high-coverage assemblies at the distance of horse and beyond. These species are shown in black in the horizontal list of names.

Presentation

Slides for 20/May/2010 (Depth of Conservation): powerpoint pdf

Dead Zones

A comparison of ChromHmm and Segway short-range classes vs annotated regions.
Integration Vignette B02

Plots

Portion of genome in each class
GC content
CpG observed/expected
CpG islands
Bases in any repeat
“Detectable” bases
“Mapability” (unique as 36-mer, plus strand only)
“Mapability” (unique as 36-mer, plus or minus strand)
“Mapability” (unique as 50-mer, plus strand only)
“Mapability” (unique as 50-mer, plus or minus strand)
Bases in any gene
Bases in any exon
Bases in any coding exon
Bases in any intron
Bases in any coding gene
Bases in any exon of a coding gene
Bases in any coding exon of a coding gene
Bases in any intron of a coding gene
Bases in any non-coding gene
Bases in any exon of a non-coding gene
Bases in any intron of a non-coding gene
High coverage SNPs
Low coverage SNPs
GWAS Catalog SNPs
GWAS Catalog SNPs (Nov/9/2010 update)
GWAS SNPs from Johnson & O'Donnell 2009
Encode PILOT, TRE clusters
Encode PILOT, TRE deserts

RNA-seq Plots

These plots are based on genome mappings made from RNAseq data, for 8 different CSHL RNA samples. Mapping cluster files were provided by Sarah Djebali. For these plots, the clusters were reduced to simple intervals (i.e. flattened), and intervals were generated for intersection with Gencode V3C biotypes. Those resulting intervals were then compared to the segmentation classes by simple bp intersection count. The plots show the results of those counts, expressed as percentage of segmentation class covered by a “feature”, and as percentage of the feature covered by each class.

RNA samples are referred to by the following abbreviations:
 RNAunion = Union of PP.001C, PP.002C, PP.001N, PP.002N, PP.001WC, PP.002WC, PM.001WC and PM.002WC
 PP.001C = PolyA+ K562 Cytosol bioreplicate1
 PP.002C = PolyA+ K562 Cytosol bioreplicate2
 PP.001N = PolyA+ K562 Nucleus bioreplicate1
 PP.002N = PolyA+ K562 Nucleus bioreplicate2
 PP.001WC = PolyA+ K562 Whole cell bioreplicate1
 PP.002WC = PolyA+ K562 Whole cell bioreplicate2
 PM.001WC = PolyA- K562 Whole cell bioreplicate1
 PM.002WC = PolyA- K562 Whole cell bioreplicate2

The plots:
all cluster intervals RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
all annotated biotypes RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
no annotated biotype RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
miRNA RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
misc RNA RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
processed transcript RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
protein coding RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
pseudogene RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
rRNA RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
scRNA pseudogene RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
snRNA RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
snRNA pseudogene RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
snoRNA RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
snoRNA pseudogene RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
non coding RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
retained intron RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC

RNA-seq Coverage Depth Plots

These plots are based on the same RNAseq mapping cluster files as above, but make use of the number of reads mapped to each cluster. For these plots, each cluster was assumed to have uniform read depth (this is obviously not true, but is the best assumption possible from the cluster files). Intervals were generated for intersection with Gencode V3C biotypes, keeping the depth from the corresponding cluster intervals. Those resulting intervals were then attributed to the segmentation classes (splitting intervals further where appropriate), producing an average read depth over each class. Note that this depth is averaged over the entire class, including intervals where no reads were mapped.

RNA samples are referred to by the same abbreviations as above.

The coverage depth plots:
all cluster intervals RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
all annotated biotypes RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
no annotated biotype RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
miRNA RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
misc RNA RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
processed transcript RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
protein coding RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
pseudogene RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
rRNA RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
scRNA pseudogene RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
snRNA RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
snRNA pseudogene RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
snoRNA RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
snoRNA pseudogene RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
non coding RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC
retained intron RNAunion PP.001C PP.002C PP.001N PP.002N PP.001WC PP.002WC PM.001WC PM.002WC

Dead Zone-related Data

Segmentation Stats spreadsheet (xls file)

Segmentations

Segmentation files

Presentation

Slides for 16/Sep/2010 (Dead Zones): powerpoint pdf