Poster Presentation Society for Molecular Biology and Evolution Conference 2016

Alignment-free networks: One step further into the next generation phylogenomics (#538)

Guillaume Bernard 1 , Paul Greenfield 2 , Mark A Ragan 1 , Cheong Xin Chan 1
  1. Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia
  2. CSIRO, Oceans and Atmosphere, North Ryde, NSW 1670, Australia

Genome evolution in microbes involves highly dynamic molecular mechanisms such as genome rearrangement and lateral genetic transfer (LGT). These mechanisms violate the implicit assumption of full-length contiguity in multiple sequence alignment (MSA), commonly used in phylogenetic analysis. Furthermore, MSA-based approaches necessitate heuristic methods e.g. Bayesian inference in reconstructing phylogenies, which are not scalable to the quantity of existing and forthcoming genome data. An alternative strategy is to infer evolutionary relatedness based on shared subsequences of fixed length, known as k-mers, i.e. alignment-free (AF) methods. Here using 143 complete genomes, we reconstruct a phylogenomic network using an AF approach ( based on 25-mers). This AF network showcases the extent of shared genomic fragments among diverse phyla, e.g. the high extent of genetic exchange among Proteobacteria versus the low extent between Chlamydophilia and Cyanobacteria. Our observations largely agree with published studies, but also highlight the inadequacy of representing microbial phylogeny as a tree. AF approaches provide exact solutions (i.e. pairwise distance between genomes based on shared k-mers) which can be directly used in deriving a phylogenomic network. Using different distance thresholds, we can easily capture changes in the network structure, e.g. cliques, that can reflect evolutionary dynamics of microbial genomes. Functional relevance of the different evolutionary scenarios, e.g. k-mers implicated in the formation of a clique of interest, can be inferred based on correlation of k-mer positions to gene annotation. To investigate the impact of plasmids and highly conserved genes in phylogenomic inferen­­ce, using 2,774 complete bacterial genomes we reconstructed AF phylogenomic networks using (a) all genome data including plasmids, (b) plasmid-only sequences, (c) chromosomal sequences without ribosomal RNAs, and (d) only ribosomal RNAs. Here I present our recent findings from these analyses, and demonstrate how AF approaches can be applied to understand the evolutionary dynamics of microbial genomes using large-scale data.