Oral Presentation Society for Molecular Biology and Evolution Conference 2016

“The data are good but the models are bad!”: examples from recent phylogenomic studies. (#95)

Frederic Delsuc 1
  1. Institut des Sciences de l'Evolution, CNRS - Université de Montpellier, Montpellier, France

As post-doc with David Penny back in 2003, “The data are good but the models are bad!” is one of the exclamations I most frequently heard from him. At the time, we were focusing on reconstructing mammalian phylogenetic relationships from complete mitochondrial genomes. The results obtained from mitogenomes were at odds with the ones obtained from nuclear genes. We showed that this incongruence mainly stemmed from heterogeneities in the mitogenomic data that were not accounted for by the simple models of sequence evolution available at the time. This led us to propose simple data reduction procedures such as RY-coding of nucleotide data to alleviate both substitutional and compositional biases. Far from being ideal, such an approach had the advantage to avoid phylogenetic reconstruction artifacts due to model misspecifications. With the advance in statistical modeling, more complex models now allow making sense of mitogenomic data. However, history repeated itself with the development of phylogenomics, as we are far from disposing of adequate models accounting for the full complexity of genomic data. Using examples from recent genome-scale studies, I will illustrate how David’s statement still applies to phylogenomic inference.