Oral Presentation Society for Molecular Biology and Evolution Conference 2016

Multiple nucleotide mutations cause rampant false positive inferences of selection on the human lineage (#60)

Aarti Venkat 1 , Joseph Thornton 1 , Matthew Hahn 2
  1. University of Chicago, Chicago, IL, United States
  2. Department of Biology, Indiana University, Bloomington, Indiana, United States

The branch-sites test has been the basis for thousands of inferences of genes under lineage-specific positive selection. The test’s models assume that mutations occur independently. But DNA replication is known to produce multiple mutations within a codon more frequently than expected under independence, and such multi-nucleotide mutations (MNMs) are more likely than single mutations to be non-synonymous. We therefore hypothesized that MNMs produced by neutral processes might cause false inferences of positive selection in the branch-sites test and sought to determine the extent of this bias, if any, along the human lineage. We analyzed a mammalian genome-wide dataset and found that codons with MNMs provided all the support for the positive selection model. When multi-nucleotide mutational processes were incorporated into the branch-sites model, 93% of genes positively selected in the original test lost their signatures of selection. To determine if realistic rates of MNM generation cause false positive inferences, we simulated evolution under model parameters derived from the mammalian dataset with MNMs but without positive selection; we found that conditions associated with 96% of genes analyzed led to unacceptable false positive rates by the branch-sites test. Under typical genome-wide evolutionary conditions, a rate of MNM production considerably lower than that observed in experimental studies of mutational processes was sufficient to cause frequent false positive inferences. Our results indicate that many genes found to be under positive selection using the branch-sites test – including the majority of such genes on the human lineage -- may be artifacts of unincorporated neutral mutational processes.