De novo protein-coding genes emerge from ancestral non-coding DNAs, generating proteins different from those encoded by known protein-coding genes, and may contribute a lot to species-specific traits. According to our previous study, most of the de novo protein-coding genes encoded long non-coding RNAs (lncRNA) in outgroups with a similar transcript structure and correlated tissue expression profile, which implies that some de novo protein-coding genes may originate from ancestral lncRNAs. However, although this theory has been widely accepted, the transition process as well as the functional importance of these genes is not well addressed to date.
Recently, we identified 64 hominoid-specific de novo genes and reported the mechanism for the origination of functional de novo proteins from ancestral lncRNAs. We revealed that even though the lncRNA precursors of de novo genes are equipped with precise splicing structures and specific tissue expression profiles, they are generally not more selectively constrained than other lncRNA loci. Besides, the existence of these newly-originated de novo proteins is not beyond anticipation under neutral expectation, as they generally have longer theoretical lifespan than their current age, due to their GC-rich sequence property enabling stable ORFs with lower chance of non-sense mutations. That is to say, all about the emergence and the retention of these de novo genes are likely driven by neutral forces. However, on the basis of the polymorphism profile provided by RhesusBase (http://www.rhesusbase.org), we found that there indeed exist signatures of purifying selection on these genes in human population, which indicates a proportion of these newly-originated proteins are already functional in human. Taken together, we proposed a mechanism for creation of functional de novo proteins from ancestral lncRNAs during the primate evolution, which may contribute to human-specific genetic novelties by taking advantage of existed genomic contexts.