Poster Presentation Society for Molecular Biology and Evolution Conference 2016

EAGER: Efficient Ancient Genome Reconstruction (#305)

Alexander Peltzer 1 2 , Günter Jäger 1 , Alexander Herbig 2 , Alexander Seitz 1 , Christian Kniep , Johannes Krause 2 , Kay Nieselt 1
  1. Integrative Transcriptomics, Eberhard Karls Universitaet , Tuebingen, Baden-Wuerttemberg, Germany
  2. Max Planck Institute for the Science of Human History, Jena, THURINGIA, Germany

More and more research projects are investigating scientific questions using ancient DNA (aDNA). Many of the available methods and pipelines for the analysis of aDNA sequencing data are difficult in application and require complex configuration and manual assembly of analysis tools. Especially with modern sequencing technology at hand, larger and more datasets are created, which require methods for the efficient and scalable analysis of such kinds of data. To address these challenges, we introduce the EAGER pipeline.

EAGER provides state-of-the-art methods to perform quality control, mapping, authentication, contamination estimation and genotyping of NGS data in an accessible manner. Our pipeline incorporates several new methods for paired-end read merging, improved duplication removal and mapping that are specifically tailored to improve the analysis output for aDNA projects. Users are provided with a graphical user interface (GUI) to configure the pipeline, hiding much of the complexity of the analytical processes. The complete pipeline is distributed as a Docker image, thus there is no requirement to install all the underlying tools independently. All the required methods and tools are provided within a single image to the end user. To further increase the usability of the pipeline, users are provided with automatically generated extensive reports of their analysis runs. These include important analysis statistics in Excel compatible formats, making the assessment of whole-genome sequencing runs very easy.

We have successfully utilized the pipeline in several projects for both bacterial and human genetic data. EAGER can reconstruct the genome of an ancient human aDNA dataset of ~100GB in size in less than one week. The pipeline is provided to the public on GitHub and our webpage.

EAGER can provide a well-defined standard for aDNA analysis, specifically incorporating the needs of labs with limited bioinformatics resources, additionally minimizing both administrative and installation effort for users.

  1. Peltzer, A., Jäger, G., Herbig, A., Seitz, A., Kniep, C., Krause, J., & Nieselt, K. (2016). EAGER: efficient ancient genome reconstruction. Genome Biology, 17(1), 60. doi:10.1186/s13059-016-0918-z
  2. Andrews, S. (2010). FastQC: A quality control tool for high throughput sequence data. Reference Source.
  3. Li, H., & Durbin, R. (2010). Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England), 26(5), 589–95. doi:10.1093/bioinformatics/btp698
  4. Langmead, B., Trapnell, C., Pop, M., & Salzberg, S. L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol, 10(3), R25. doi:10.1186/gb-2009-10-3-r25
  5. Li, H. (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv Preprint arXiv:1303.3997.
  6. Ginolhac, A., Rasmussen, M., Gilbert, M. T. P., Willerslev, E., & Orlando, L. (2011). mapDamage: testing for damage patterns in ancient DNA sequences. Bioinformatics (Oxford, England), 27(15), 2153–5. doi:10.1093/bioinformatics/btr347
  7. Renaud, G., Slon, V., Duggan, A. T., & Kelso, J. (2015). Schmutzi: estimation of contamination and endogenous mitochondrial consensus calling for ancient DNA. Genome Biology, 16(1), 224. doi:10.1186/s13059-015-0776-0
  8. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., … DePristo, M. A. (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20(9), 1297–303. doi:10.1101/gr.107524.110
  9. Korneliussen, T. S., Albrechtsen, A., & Nielsen, R. (2014). ANGSD: Analysis of Next Generation Sequencing Data. BMC Bioinformatics, 15(1), 356. doi:10.1186/s12859-014-0356-4
  10. Schubert, M., Ermini, L., Der Sarkissian, C., Jonsson, H., Ginolhac, A., Schaefer, R., … Orlando, L. (2014). Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX. Nat Protoc, 9(5), 1056–1082. doi:10.1038/nprot.2014.063