The availability of genome-scale inter- and intraspecies data leads to new opportunities in phylogenetics to improve tree accuracy and resolution as well as to take important steps towards understanding the process of speciation.
We present a reversible Polymorphism-Aware Phylogenetic Model (revPoMo) for species tree estimation from genome-wide data. revPoMo enables the reconstruction of large scale species trees for many within-species samples. PoMo expands the alphabet of DNA substitution models to include polymorphic states (De Maio et al., MBE 2013). It is a selection-mutation model which separates the mutation process from the fixation process. Thereby, a Moran process is used to model genetic drift. Although a single phylogeny — the species tree — is considered, PoMo naturally accounts for incomplete lineage sorting because ancestral populations can be in a polymorphic state. A large scale simulation study as well as applications to great ape data (12 populations in total, Prado-Martinez et al., 2013) show that PoMo is fast while being more accurate than coalescent-based methods (De Maio, Schrempf, and Kosiol, Syst. Biol. 2015).
We recently implemented revPoMo in the maximum likelihood software IQ-TREE (Nguyen et al., 2015). The runtimes of our approach and standard substitution models are now comparable on large trees (e.g. 60 species with 10 individuals each) but revPoMo has much better accuracy in estimating trees, divergence times and mutation rates under the scenarios of recent radiation and incomplete lineage sorting. The advantage of revPoMo is that an increase of sample size per species improves estimations but does not increase runtime. Therefore, revPoMo is a valuable tool with several applications, from speciation dating to species tree reconstruction. We also present preliminary results on applying our method to genome-wide data from baboons showing interesting insights into their complex population history and new estimates of divergence times and mutation rates.