Poster Presentation Society for Molecular Biology and Evolution Conference 2016

Phylogenomics analysis of large bacterial phylogenies using whole genome information (#604)

Minh Duc Cao 1 , Lachlan Coin 1
  1. The University of Queensland, Brisbane, QUEENSLAND, Australia

Advances in high-throughput sequencing have dramatically transformed microbial research by allowing sequencing whole genomes of thousands of microorganisms in a single study. The evolutionary analysis of these microorganisms is generally a crucial step in any many analyses. Classical phylogenetic analyses based on single genes or parts of the genomes are often inconsistent because of factors such as variable rates of evolution and horizontal gene transfers. Recent phylogenetic studies focus on establishing the phylogeny from core SNPs and hence are limited to the analysis of closely related organisms. Here we present an information theoretic method to estimate the genetic distance between two bacterial organisms using the mutual information of their genomic sequences. The method employs an adaptive local alignment model to identify homologous regions and quantify all the variation into a unified information unit. As a consequence, our method can account for a range of diversity among the taxa as well as different types of variation such as SNPs, indels and rearrangements. We demonstrate the robustness of our method by building a phylogeny of 2000 bacterial organisms in 12 pathogen bacterial species from their draft genome assemblies. We found that all taxa from the same species were correctly grouped together and the placements of these species were in complete agreement with the current bacterial taxonomy. The subtree for each species was also largely congruent with the tree built from the multi-locus strain typing profiles of the species.