Linking gene family expansions to the functional evolution of proteins is an interesting challenge in evolutionary biology. Plants contain a large number of receptor-like protein kinases (RLKs) to be able to sense and response to changes in their environment. The cysteine-rich receptor-like protein kinases (CRKs) are distinguished from the other RLKs based on the structure of their extracellular region which contains DUF26 (domain of unknown function 26; also known as stress-antifung domain, PF01657) domains. DUF26 domains are also found in two closely related gene families, the plasmodesmata-localized proteins (PDLPs) and the cysteine-rich receptor-like secreted proteins (CRSPs). The DUF26 domain is plant-specific and contains the conserved cysteine motif C-8X-C-2X-C.
In order to understand their evolution, we have identified and manually curated gene models for CRKs, PDLPs and CRSPs from more than 20 plant species covering most plant lineages. We can identify genes with DUF26 domain from land plants but not from the sequenced algae species. Our main interest is to understand why so many genes in these gene families are maintained after duplication events and why different phylogenetic subgroups have expanded in different plant lineages. Our data suggests that genes containing two DUF26 domains appeared for the first time in the lycophytes. Intriguingly, in genes with two DUF26 domains, the first and the second DUF26 domain have differentiated into specific forms with unique sequence context surrounding the conserved cysteines. There is also considerable variation within DUF26 domains between the different phylogenetic subgroups of the CRKs in Arabidopsis. This variation might have functional and structural importance for the extracellular domain of the CRKs.