In population genomics, the observation of changes in allele frequencies play an important role in understanding evolutionary forces. Recent advances in sequencing technologies have made it possible to observe evolutionary trajectories in more detail than ever. It has become feasible not only to sequence the last generation of a population at the end of long-term treatments but to monitor experimentally genome evolution at intermediate generations in Drosophila. However, distinguishing the alleles that are changing under selection from those just displaying genetic drift is challenging due to the large number of false positives.
Here we present a Gaussian Process (GP) approach to model the evolutionary time series data. First we infer the Single Nucleotide Polymorphism (SNP) frequencies and their observation noise variances from sequencing data under a Beta-Binomial model for the read counts from Next Generation Sequencing. Then, we fit time-dependent and time-independent GP models to logistic transformed frequencies, while incorporating the inferred noise variances in the models. Finally, we compute the Bayes Factors between time-dependent and time-independent GP models and rank the SNPs according to their Bayes Factors.
We compare the performances of our method and pairwise statistical test on a simulated dataset which mimics the sequencing data of Drosophila, with four replicates along eight generations. Results show that our method outperforms with a higher precision at the same recall rate and making use of the inferred noise variance in the GP models helps to decrease the number of false positives.
We present results from applying our approach on real data from Drosophila simulans experimental evolution for temperature adaptation. We also show preliminary results that the proposed method is able to find signatures of selection on ACE gene.