Oral Presentation Society for Molecular Biology and Evolution Conference 2016

Estimating seven coefficients of pairwise relatedness using population genomic data (#226)

Matthew S Ackerman 1 , Parul Johri 1 , Ken Spitze 1 , Sen Xu 2 , Thomas Doak 1 , Kimberly Young 1 , Michael Lynch 1
  1. Indiana University, Bloomington, IN, United States
  2. University of Texas at Austin, Austin, TX

Population structure is described by genotypic correlation coefficients between individuals, the most basic of which are Jacquard's nine condensed coefficients. These correlation coefficients form the basis of quantitative-genetic analysis, and geneticists perform experimental crosses or pedigree analysis in order to recover them. Molecular techniques can be used to recover these coefficients between individuals with unknown relationships, but previously could only recover four of these coefficients at best. I have developed a method for recovering seven coefficients using the full set of biallelic loci derived from whole-genome sequences and a maximum-likelihood method. This approach should allow for more robust estimation of the components of genetic variance from population-genomic data, and is potentially very useful for conservation genetics.

Simulations show that the procedure is nearly unbiased, even at the minimally informative 3$\times$ coverage, and that errors in five of the seven coefficients are statistically uncorrelated. The sum of the remaining two coefficients provides an unbiased assessment of the overall correlation of heterozygosity between two individuals. These methods have been applied to four populations of the freshwater crustacean Daphnia pulex, revealing several interesting characteristics that are not apparent with other techniques. The use of a maximum-likelihood method also allows us to assess statistical significance of relationships using a log-likelihood ratio test, and we find statistically significant negative estimates of many of these pair-wise relatedness coefficients. Although these coefficients are traditionally regarded as measures of identity probabilities, which cannot be negative, we treat them as measures of conditional association, which can be negative. These methods are implemented as part of an expansive package of maximum-likelihood programs for the analysis of population genomic data (mapgd) that I have implemented, which we hope will greatly enhance the power of such studies (available from https://github.com/LynchLab/MAPGD).