Oral Presentation Society for Molecular Biology and Evolution Conference 2016

protTrace: Predicting the evolutionary traceabilities for proteins and pathways (#208)

Arpit Jain 1 , Arndt Prof von Haeseler 2 , Ingo Prof Ebersberger 1
  1. Goethe University, Frankfurt Am Main, HESSEN, Germany
  2. CIBIV, University of Vienna, Vienna, Austria

The identification and functional characterization of functional protein networks concentrates on
few and often only distantly related model organisms. Integrating these individual insights into
comprehensive picture of molecular and functional evolution is typically subject of downstream
bioinformatics analyses. Here gene sets of broad and phylogenetically diverse species collection
are screened for homologs to previously described pathway components. The resulting
phylogenetic profiles serve as basis for inferences regarding distribution of the corresponding
pathways across species, their evolutionary origins and functional diversification. Typically
ignored during this interpretation is the circumstance that sensitivity of homolog searches
decrease as function of evolutionary time. Gaps in phylogenetic profiles especially for distantly
related or fast evolving species may therefore either represent genuine absence of the
corresponding proteins or an artefact of limited sensitivity in homolog search. Here we introduce
the concept of evolutionary traceability to facilitate informed interpretation of phylogenetic
profiles. We present a framework to compute for a protein and an evolutionary time the
probability to detect homolog by means of significant sequence similarity if it’s present.
Specifically, we simulate protein sequence change over time considering each protein’s specific
constraints on evolutionary process. We monitor decay of similarity to the original sequence and
determine the time when a significant similarity is no longer detected. Repeating the simulation
1,000 times and fitting a logistic growth curve to the observed data obtains then for each protein
a detection probability distribution over time. Mapping this information onto a species tree
determines for any protein of interest whether or not sequence similarity is likely to suffice for
homolog identification in a given species. We have exemplified our approach by tracing the evolution of S. cerevisiae genome across the three domains of life. Further, we resolved the error bias in estimating gene age with reliable interpretations of phylogenetic profiles.