Poster Presentation Society for Molecular Biology and Evolution Conference 2016

Inferring the unfolded site frequency spectrum and using it to quantify adaptive molecular evolution in Drosophila (#640)

Peter Keightley 1 , Tom Booker 1 , Jose Campos 1 , Brian Charlesworth 1
  1. University of Edinburgh, Charlotte Auerbach Rd, UK, United Kingdom

The unfolded site frequency spectrum (uSFS) is a vector of counts of sites with different numbers of copies of the derived allele in a sample of gene copies from a population. The uSFS contains extra information compared to the folded SFS, a vector of counts of sites with different numbers of copies of the minor allele. Inferring the uSFS depends on using outgroups to estimate the frequency of sites with the ancestral versus the derived allele, potentially leading to statistical uncertainty because of multiple hits in the outgroup lineages. We present a new approach to inferring the uSFS, which we test by simulations. We show that there is usually a substantial increase in precision from using two outgroups rather than a single outgroup. We apply the approach to infer the uSFSs for synonymous and nonsynonymous sites of protein-coding genes in Drosophila using polymorphism data from whole-genome sequencing projects and the sequences of outgroup species. We then use the uSFSs along with the software DFE-alpha to infer the distribution of fitness effects of new mutations, i.e., the relative frequencies and effects of deleterious and advantageous mutations. We show that models with a significant fraction of advantageous mutations fit the polymorphism data substantially better than models that assume there are only deleterious or neutral mutations.