Understanding the sequence coverage distribution results

The results are a composite dataset. Clicking on the eye icon in the history panel shows a page with some information about the tool run that produced the results and links to the actual results. The graph gives a good overall view of the coverage but can be difficult especially with many individuals to know which line goes with each individual. The text gives a easier to read view of just how many SNPs are at each coverage level in each individual.

To filter the SNPs to get those that are more reliable we will filter for higher coverage. The scores on low coverage SNP data are not a dependable way of determining the reliability. We have nine million SNPs but don't need nearly that many for the analysis. Looking at the text in the column for eight times coverage we see what percentage of SNPs for that individual are at or below that coverage. We want at least one hundred thousand SNPs for the analysis so we can afford to lose about ninety percent of the SNPs. Since each of the individuals has a different total number of reads we can't get an exact count of how many this will leave us. It gives a good estimate of where to start and if the resulting number of SNPs is not what we expected we can rerun the later tools with a different parameter.

[screen shot]


[screen shot]
[screen shot]