Poster Presentation Society for Molecular Biology and Evolution Conference 2016

Robustness of Bayesian molecular dating to tree prior misspecification (#585)

Andrew M Ritchie 1 , Nathan Lo 1 , Simon YW Ho 1
  1. School of Life and Environmental Sciences, University of Sydney, University Of Sydney, NSW, Australia

The practice of specifying prior probabilities is a defining feature of Bayesian phylogenetics. In the case of Bayesian molecular dating, the specification of priors on divergence times, also known as tree priors, is especially contentious. Existing tree priors include pure-birth and birth-death priors used for species-level data and coalescent priors designed for population-level data. All tree priors make strong assumptions regarding the nature of the underlying evolutionary process and are likely to be misspecified to some degree for many real data sets. However, the assumption that sufficient sequence data can overcome any biases in the prior has meant that there has been little empirical investigation into the behaviour of real analyses under different tree priors. This could lead to undetected errors being introduced into analyses of non-conforming datasets, such as those including a mixture of between- and within-species relationships. 

We tested the robustness of Bayesian analyses to increasing degrees of tree prior misspecification by simulating mixed inter- and intra-species datasets along a continuum from few species with many individuals (coalescent-like) to many species with few individuals (pure-birth-like). We estimated divergence dates under each prior and compared the analyses for accuracy, precision and model fit. We confirmed the applicability of our results to three empirical data sets for cetaceans, phocids and whitefish.

Our results suggest that Bayesian dating is quite robust to the choice of tree prior in most cases, even when it is severely misspecified for the data.  However, simpler priors such as the pure birth prior can produce inaccurate results on some heterogeneous data sets even with substantial amounts of sequence data.  More complex priors show greater robustness and should be preferred where more extensive model testing is not practicable.