Oral Presentation Society for Molecular Biology and Evolution Conference 2016

Assessment of substitution model adequacy for phylogenomics (#163)

David A. Duchene 1 , Simon Y.W. Ho 1 , Sebastian Duchene 1
  1. University of Sydney, Camperdown, NSW, Australia

Genome sequences offer a rich source of data for studying evolutionary relationships and biological processes. They present a number of challenges to phylogenetic methods, including complex patterns of variation and large computational demands. To improve the feasibility of phylogenetic analysis, genomic data are sometimes filtered according to a chosen criterion. One approach is to filter data according to how well the evolutionary model describes them, using one of several tests of model adequacy. However, the efficacy of these tests for identifying when phylogenetic inferences will be unreliable remains unknown. We propose a framework for assessing substitution model adequacy using fast likelihood methods. Based on a simulation study, we find that some test statistics can identify particular sources of bias. Other test statistics are highly conservative, frequently rejecting the model when the inferences are not inaccurate or imprecise. We demonstrate our framework by analysing three large data empirical sets, and find that selecting data using our approach can lead to different phylogenetic inferences. Model-adequate data according to our approach produce more congruent inferences than model-inadequate data, which has also been identified in previous research. Filtering genomic data using the test statistics identified in our simulation study improves the reliability of inferences, and can be useful tool for phylogenomic studies.