Oral Presentation Society for Molecular Biology and Evolution Conference 2016

Probabilistic inference of positive and negative selection in cancer (#92)

Donate Weghorn 1 , Matteo D'Antonio 2 , Kelly Frazer 2 , Shamil Sunyaev 1
  1. Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Cambridge, MA, United States
  2. Division of Genome Information Sciences, University of California, San Diego, La Jolla, CA, United States

Seen from an evolutionary perspective, cancer is a highly complex system that evolves asexually under high mutation rates and strong selective pressures. Recurrence of mutations or their marked absence attest the selection acting on a given sequence, but knowing the proper mutational null model of how many mutations to expect is highly nontrivial. Here we present a probabilistic approach to addressing this question, which we apply to 17 cancer types. Using an empirical Bayes framework, we infer the distribution of mutation rate across genes that underlies the observed distribution of the synonymous mutation count within a given cancer type. This enables an inference of the posterior probability of nonsynonymous mutations under neutrality without additional parameters, however explicitly taking into account cancer type-specific mutational signatures, which are known to be highly distinct. We find substantial overlap of our predicted set of significantly positively selected genes with known cancer genes. In addition, we use our model and the large patient cohort to quantify negative selection. While the genome-wide average signal of negative selection is largely weak, we find marginally significant cancer type-specific sets of candidate genes. Moving from coding to non-coding sequence, we applied a similar approach to the detection of hypermutation in breast cancer DNAse hypersensitivity sites, an indicator for positive regulatory selection. Here, the background mutation rate is inferred from clustering according to known mutation rate covariates, and it again informs the expected number of mutations under neutrality. We find 22 putative breast cancer driver DHSs, three of which are significantly hypermutated across 19 cancer types. We further validated one of these DHSs experimentally and one based on expression data. Taken together, these applications show the impact of probabilistic modeling of mutation events to unveil the various signals of selection in cancer, which may inform targets of cancer therapy.