Preparing the dataset for subsequent tools

Next we will convert the format of the dataset and also filter the SNPs, in preparation for running the PCA and Ancestry tools. In the Population Structure subsection of Genome Diversity, the Prepare Input tool reformats gd_snp datasets for use by several of the other tools.

The gd_snp dataset should already be automatically selected. This time we will only require a minimum coverage of five reads, because a SNP must meet this threshold in all individuals to be used (unlike the Phylogenetic Tree tool, which looks at individuals in pairs). This lower value should allow us to keep enough SNPs, while still eliminating the unreliable ones. Leave the other options at their defaults, and click the Execute button.

[screen shot]