Oral Presentation Society for Molecular Biology and Evolution Conference 2016

Assessing Methods for Outlier Detection in Phylogenetic Inference (#5)

Daisy Shepherd 1 , Steffen Klaere 1 , Jale Basten , Ole Geldschlager
  1. Department of Statistics, The University of Auckland, Auckland, New Zealand

Different sites in an alignment can support phylogenetic trees in various ways. Most notably, deep phylogenies tend to have issues with outlier or saturated sites, that can drastically change the favoured topology. In some situations, these sites can even drive the inference toward selecting the wrong topology.

To avoid systematic errors related to saturation issues, current approaches propose to remove these saturated sites to improve the stability of tree inference. These methods use statistics to determine which sites to remove, before performing the phylogenetic analysis.

A number of studies have applied these methods to alignment data prone to saturation issues. The results indicated a common breakthrough: the removal of saturated sites led to a change in the preferred phylogenetic tree.

However, these methods have a couple of issues: (1) It is not clear how the methods fare when the data have no systematic error, and (2) whether the amount of sites claimed as saturated is appropriate. In our study, we addressed the first issue by simulating "good" alignments and investigating the performance of the proposed statistics. Further, we addressed the second problem by comparing which sites were declared saturated under each statistic, whilst invoking the discrete nature of the OV distance as a means of visualisation of the scores.