Genomes of Escherichia coli, including that of the pathogen enterohemorrhagic E. coli O157:H7 (EHEC), still harbor undetected protein-coding genes which, apparently, have escaped annotation due to their small size and non-essential function. However such genes might be important for adaptation and evolution. Mass spectrometry (MS) based proteomics is a sensitive high-throughput method that directly measures the presence of polypeptides. An emerging technique to determine translational activity is ribosomal footprinting (Ribo-seq), measuring the ribosome coverage of mRNA, thus constituting the translatome.
A quantitative comparison of 52 well MS-measurable EHEC proteins with Ribo-seq translatome data resulted in a positive Pearson correlation (rP = 0.84). But the global correlation between MS and Ribo-seq data of all proteins drops substantially. This could be attributed to proteins with a bias in MS detection, caused by “non-standard” protein parameters, e.g., low abundancy, missing or too many tryptic cleavage sites, or hydrophobicity, and differences in protein half-life. Almost all (98%) of the proteins detected by MS display a strong translatome signal, but many proteins not detected by MS showed a significant Ribo-seq signal for their mRNA as well. A number of not-annotated genes were found, some of which are annotated in other enterobacteriaceae. However, several genes constitute novel discoveries. In addition, protein structure and function were predicted computationally and compared between EHEC-encoded proteins and 100-times randomly shuffled proteins. Based on this comparison, about 85% novel proteins exhibit predicted structural and functional features similar to those of annotated proteins.
These findings demonstrate that ribosomal footprinting can be used to detect novel protein coding genes, contributing to the growing body of evidence that such weak genes are not mere artifacts. Ribo-seq opens an additional way to detect and analyze novel genes, since most were taxonomically restricted and, therefore, appear to have evolved relatively recently de novo.