Poster Presentation Society for Molecular Biology and Evolution Conference 2016

A transcriptome annotation pipeline for non-model organisms (#542)

Fabrizio Ghiselli , Mariangela Iannello , Emanuele Procopio , Marco Passamonti

The introduction of high-throughput sequencing technologies allowed researchers to generate large amounts of genomic data at limited cost and time. This opportunity had a groundbreaking impact on the study of non-model organisms: above all, RNA-Seq and de novo transcriptome assembly represent a valuable source of information in species for which genomic resources are scarce or absent. However, sequencing and assembly are only the first steps, and an accurate annotation is fundamental for every kind of biological analysis. Annotation of transcriptomes from model organisms and their closely-related species is quite straightforward, and is generally based on simple sequence similarity searches. Conversely, non-model organisms require more complex and integrated procedures in order to infer remote homology and function. We present a pipeline specifically thought for the annotation of transcriptomes of non-model organisms. It consists of an integrated approach that combines different bioinformatics tools to obtain: 1) filtration from contaminant sequences; 2) ORF prediction, identification of pseudogenes and artificially fused transcripts; 3) coding sequence annotation based both on sequence similarity and on the identification of conserved domains by protein signature recognition; 4) functional annotation of coding sequences by the assignment of GO terms; 5) identification of orthologs; 6) annotation of noncoding transcripts. We tested our pipeline by re-annotating the transcriptome of Ruditapes philippinarum (Bivalvia, Veneridae).