The defensins are ancient, diverse and fast-evolving cysteine-rich proteins found across the eukaryotes. They variously display antimicrobial, signalling and ion channel disruption activities. However, their highly divergent sequences have caused traditional methods of sequence analysis to fail, hampering our understanding of how such proteins evolve and how to engineer their activities. To address this shortfall, we have applied structure-based analyses, and developed new methods of cysteine-rich protein alignment and quantitative maps of protein sequence space.
Through these methods we have shown that the defensins consist of two, independent superfamilies that have undergone some of the most extreme convergent evolution currently known for protein sequence, structure and function. The use of disulphides to display loop sequences makes the defensins highly evolvable, but also imposes evolutionary constraints which have funnelled their convergence.
Structural homology is used to guide multiple sequence alignments by barcoding cysteines known to be genuinely homologous. We use these alignments to generate quantitative maps of protein sequence space. Using multivariate analysis to rotate and project these mega-dimensional spaces into a human-understandable space reveals naturally occurring clusters of sequences with similar biophysical properties. Finally, this allows us to mine the existing diversity generated by evolution to design cluster-central ‘archetypal’ sequences, somewhat analogous to ancestral sequence reconstruction for engineering increased activity, promiscuity and stability.
The techniques developed for this particularly difficult case are applicable to other protein superfamilies, and complement established sequence analysis methods.