More and more research projects are investigating scientific questions using ancient DNA (aDNA). Many of the available methods and pipelines for the analysis of aDNA sequencing data are difficult in application and require complex configuration and manual assembly of analysis tools. Especially with modern sequencing technology at hand, larger and more datasets are created, which require methods for the efficient and scalable analysis of such kinds of data. To address these challenges, we introduce the EAGER pipeline.
EAGER provides state-of-the-art methods to perform quality control, mapping, authentication, contamination estimation and genotyping of NGS data in an accessible manner. Our pipeline incorporates several new methods for paired-end read merging, improved duplication removal and mapping that are specifically tailored to improve the analysis output for aDNA projects. Users are provided with a graphical user interface (GUI) to configure the pipeline, hiding much of the complexity of the analytical processes. The complete pipeline is distributed as a Docker image, thus there is no requirement to install all the underlying tools independently. All the required methods and tools are provided within a single image to the end user. To further increase the usability of the pipeline, users are provided with automatically generated extensive reports of their analysis runs. These include important analysis statistics in Excel compatible formats, making the assessment of whole-genome sequencing runs very easy.
We have successfully utilized the pipeline in several projects for both bacterial and human genetic data. EAGER can reconstruct the genome of an ancient human aDNA dataset of ~100GB in size in less than one week. The pipeline is provided to the public on GitHub and our webpage.
EAGER can provide a well-defined standard for aDNA analysis, specifically incorporating the needs of labs with limited bioinformatics resources, additionally minimizing both administrative and installation effort for users.