The eukaryotic protein kinase (ePK) domain mediates the majority of signaling and coordination of complex events in eukaryotes. By contrast, most bacterial signaling is thought to occur through structurally unrelated histidine kinases, though some ePK-like kinases (ELKs) and small molecule kinases are known in bacteria. Our analysis of the Global Ocean Sampling (GOS) dataset reveals that ELKs are as prevalent as histidine kinases and may play an equally important role in prokaryotic behavior. By combining GOS and public databases, we show that the ePK is just one subset of a diverse superfamily of enzymes built on a common protein kinase–like (PKL) fold. We explored this huge phylogenetic and functional space to cast light on the ancient evolution of this superfamily, its mechanistic core, and the structural basis for its observed diversity. We cataloged 27,677 ePKs and 18,699 ELKs, and classified them into 20 highly distinct families whose known members suggest regulatory functions. GOS data more than tripled the count of ELK sequences and enabled the discovery of novel families and classification and analysis of all ELKs. Comparison between and within families revealed ten key residues that are highly conserved across families. However, all but one of the ten residues has been eliminated in one family or another, indicating great functional plasticity. We show that loss of a catalytic lysine in two families is compensated by distinct mechanisms both involving other key motifs. This diverse superfamily serves as a model for further structural and functional analysis of enzyme evolution.
The huge growth in sequence databases allows the characterization of every protein sequence by comparison with its relatives. Sequence comparisons can reveal both the key conserved functional motifs that define protein families and the variations specific to individual subfamilies, thus decorating any protein sequence with its evolutionary context. Inspired by the massive sequence trove from the Global Ocean Survey project, the authors looked in depth at the protein kinase–like (PKL) superfamily. Eukaryotic protein kinases (ePKs) are the pre-eminent controllers of eukaryotic cell biology and among the best studied of enzymes. By contrast, their prokaryotic relatives are much more poorly known. The authors hoped to both characterize and better understand these prokaryotic enzymes, and also, by contrast, provide insight into the core mechanisms of the eukaryotic protein kinases. The authors used remote homology methods, and bootstrapped on their discoveries to detect more than 45,000 PKL sequences. These clustered into 20 major families, of which the ePKs were just one. Ten residues are conserved between these families: 6 were known to be important in catalysis, but four more—including three highly conserved in ePKs—are still poorly understood, despite their ancient conservation. Extensive family-specific features were found, including the surprising loss of all but one of the ten key residues in one family or another. The authors explored some of these losses and found several cases in which changes in one key motif substitute for changes in another, demonstrating the plasticity of these sequences. Similar approaches can be used to better understand any other family of protein sequences.
Over 45,000 kinases, including 16,000 identified in the GOS expedition, were classified into 20 distinct families. This massive sequence comparison revealed a structural flexibility within eukaryotic protein kinases that helps explain their huge expansion in eukaryotes.