Poster Presentation Society for Molecular Biology and Evolution Conference 2016

Robust identification of hard and soft sweeps in humans via machine learning (#651)

Daniel R Schrider 1 , Andrew D Kern 1
  1. Rutgers University, Piscataway, NJ, USA

Detecting the targets of adaptive natural selection from whole genome sequencing data is a central problem for population genetics. Numerous approaches have been devised to detect the population genetic signature of a de novo beneficial mutation sweeping rapidly to fixation (a hard selective sweep). To date most of these methods to detect sweeps show poor performance under realistic demographic scenarios. Moreover, over the past decade there has been a renewed interest in determining the importance of selection from standing variation (soft sweeps) in adaptation of natural populations, yet few methods are sensitive to this mode of selection. Here we introduce a new tool, S/HIC, which uses supervised machine learning to precisely infer the location of both hard and soft selective sweeps. We show that S/HIC has unrivaled accuracy for detecting sweeps under demographic histories that are relevant to natural populations, and distinguishing sweeps from linked as well as neutrally evolving regions. Moreover we show that S/HIC is uniquely robust among its competitors to demographic misspecification: even if the true demographic model of a population differs catastrophically from that specified by the user, S/HIC still retains impressive discriminatory power. Next, we apply S/HIC to resequencing data from human European and African population samples from the 1000 Genomes Project. S/HIC reliably recovers selective sweeps that have been identified earlier using less specific and sensitive methods, and identifies several compelling novel candidates, including a tumor suppressor gene that is often mutated or deleted in breast tumors. Lastly we perform the first genome-wide examination of the prominence of hard versus soft sweeps in human populations, finding a much greater frequency of soft sweeps in Africa. This result confirms theoretical predictions that larger populations will more often respond to adaptive challenges by selecting on previously standing polymorphisms.