Oral Presentation Society for Molecular Biology and Evolution Conference 2016

Decay of Accuracy of Genomic Prediction with Genetic Distance (#227)

David J Balding 1 , Marco Scutari 2 , Ian McKay 3
  1. University of Melbourne, University Of Melbourne, VIC, Australia
  2. Oxford University, Oxford
  3. NIAB, Cambridge

Statistical models used for the prediction of quantitative traits from high-density genomic data are usually tested using cross-validation, which implicitly assumes that new individuals (whose phenotypes we would like to predict) originate from the same population that the genomic prediction model is trained on.  However in many settings models trained in one population are used to predict phenotypes in different populations.  We investigate the decay of predictive accuracy as the genetic distance between the training and target populations increases.   We do this using clustering and resampling to construct a sequence of target populations of increasing genetic distance from the training population.  We find that the correlation between true and predicted values decays approximately linearly with respect to FST (or mean kinship) between the training and the target populations. We illustrate this relationship using data sets from mice, wheat and humans.  In addition to analysis of real-world data, we apply our approach to a simulated multi-generation genomic selection experiment and to simulated phenotypes using worldwide human genome data.