The group of proteins that contain a thioredoxin (Trx) fold is huge and diverse. Assessment of the variation in catalytic machinery of Trx fold proteins is essential in providing a foundation for understanding their functional diversity and predicting the function of the many uncharacterized members of the class. The proteins of the Trx fold class retain common features—including variations on a dithiol CxxC active site motif—that lead to delivery of function. We use protein similarity networks to guide an analysis of how structural and sequence motifs track with catalytic function and taxonomic categories for 4,082 representative sequences spanning the known superfamilies of the Trx fold. Domain structure in the fold class is varied and modular, with 2.8% of sequences containing more than one Trx fold domain. Most member proteins are bacterial. The fold class exhibits many modifications to the CxxC active site motif—only 56.8% of proteins have both cysteines, and no functional groupings have absolute conservation of the expected catalytic motif. Only a small fraction of Trx fold sequences have been functionally characterized. This work provides a global view of the complex distribution of domains and catalytic machinery throughout the fold class, showing that each superfamily contains remnants of the CxxC active site. The unifying context provided by this work can guide the comparison of members of different Trx fold superfamilies to gain insight about their structure-function relationships, illustrated here with the thioredoxins and peroxiredoxins.
For any large class of proteins, far more protein sequences are known than can be examined experimentally. This is the case with the thioredoxin fold class, a large and diverse collection of proteins, some of which are known to catalyze important steps in metabolism. Some others participate in key processes like protein folding and detoxification of foreign compounds. Many of the unstudied proteins likely participate in other important biological processes and have useful applications in medicine and industry. We used a new network-based computational approach to create similarity-based maps of the thioredoxin fold class. These maps juxtapose unstudied proteins with similar well-characterized proteins, helping to show where existing knowledge can help predict properties of uncharacterized sequences. This information can be used to identify which of these sequences are interesting and deserve experimental characterization. We also used the maps to gain insight about how shared structural features are used and modified to affect catalysis in the different subclasses, leading to a better understanding of the interplay between structure and function in the thioredoxin fold class.