+1 Recommend
1 collections
      • Record: found
      • Abstract: found
      • Article: found

      Classification of Intrinsically Disordered Regions and Proteins


      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          1 Introduction 1.1 Uncharacterized Protein Segments Are a Source of Functional Novelty Over the past decade, we have observed a massive increase in the amount of information describing protein sequences from a variety of organisms. 1,2 While this may reflect the diversity in sequence space, and possibly also in function space, 3 a large proportion of the sequences lacks any useful function annotation. 4,5 Often these sequences are annotated as putative or hypothetical proteins, and for the majority their functions still remain unknown. 6,7 Suggestions about potential protein function, primarily molecular function, often come from computational analysis of their sequences. For instance, homology detection allows for the transfer of information from well-characterized protein segments to those with similar sequences that lack annotation of molecular function. 8−10 Other aspects of function, such as the biological processes proteins participate in, may come from genetic- and disease-association studies, expression and interaction network data, and comparative genomics approaches that investigate genomic context. 11−17 Characterization of unannotated and uncharacterized protein segments is expected to lead to the discovery of novel functions as well as provide important insights into existing biological processes. In addition, it is likely to shed new light on molecular mechanisms of diseases that are not yet fully understood. Thus, uncharacterized protein segments are likely to be a large source of functional novelty relevant for discovering new biology. 1.2 Structure–Function Paradigm Enhances Function Prediction Traditionally, protein function has been viewed as critically dependent on the well-defined and folded three-dimensional structure of the polypeptide chain. This classical structure–function paradigm (Figure 1; left panel) has mainly been based on concepts explaining the specificity of enzymes, and on structures of folded proteins that have been determined primarily using X-ray diffraction on protein crystals. The classical concept implies that protein sequence defines structure, which in turn determines function; that is, function can be inferred from the sequence and its structure. Even when protein sequences diverge during evolution, for example, after gene duplication, the overall fold of their structures remains roughly the same. Therefore, structural similarity between proteins can reveal distant evolutionary relationships that are not easily detectable using sequence-based methods. 18,19 Structural genomics efforts such as the Protein Structure Initiative (PSI) have been set up to enlarge the space of known protein folds and their functions, thereby complementing sequence-based methods in an attempt to fill the gap of sequences for which there is no function annotation. 20,21 Specifically, phase two of the PSI aimed to structurally characterize proteins and protein domains of unknown function, often providing the first hypothesis about their function and serving as a starting point for their further characterization. 1.3 Classification Further Facilitates Function Prediction Classification schemes provide a guideline for systematic function assignment to proteins. Generally, proteins are made up of a single or multiple domains that can have distinct molecular functions. These domains, which are referred as structured domains, often fold independently, make precise tertiary contacts, and adopt a specific three-dimensional structure to carry out their function. The sequences that compose structured domains can be organized into families of homologous sequences, whose members are likely to share common evolutionary relationship and molecular function. The Pfam database classifies known protein sequences and contains almost 15 000 such families, for most of which there is some understanding about the function. 22 Nevertheless, Pfam also contains more than 3000 families annotated as domains of unknown function, or DUFs. 23 These families are largely made up of hypothetical proteins and await function annotation. Another powerful example of a protein classification scheme is the Structural Classification of Proteins (SCOP), which provides a means of grouping proteins with known structure together, based on their structural and evolutionary relationships. 24,25 SCOP utilizes a hierarchical classification consisting of four levels, (i) family, (ii) superfamily, (iii) fold, and (iv) class, with each level corresponding to different degrees of structural similarity and evolutionary relatedness between members. Using this scheme, function of newly solved structures or sequences can be inferred from their similarity with existing protein classes through structure or sequence comparisons, for instance, as available via the SUPERFAMILY database. 10 In this direction, another major initiative is Genome3D, which is a collaborative project to annotate genomic sequences with predicted 3D structures based on CATH 26 (Class, Architecture, Topology, Homology) and SCOP 24,25 domains to infer protein function. 27 1.4 Intrinsically Disordered Regions and Proteins While many proteins need to adopt a well-defined structure to carry out their function, a large fraction of the proteome of any organism consists of polypeptide segments that are not likely to form a defined three-dimensional structure, but are nevertheless functional. 28−42 These protein segments are referred to as intrinsically disordered regions (IDRs; Figure 1; right panel). 43 Because IDRs generally lack bulky hydrophobic amino acids, they are unable to form the well-organized hydrophobic core that makes up a structured domain 31,44 and hence their functionality arises in a different manner as compared to the classical structure–function view of globular, structured proteins. In this framework, protein sequences in a genome can be viewed as modular because they are made up of combinations of structured and disordered regions (Figure 1; bottom panel). Proteins without IDRs are called structured proteins, and proteins with entirely disordered sequences that do not adopt any tertiary structure are referred to as intrinsically disordered proteins (IDPs). The majority of eukaryotic proteins are made up of both structured and disordered regions, and both are important for the repertoire of functions that a protein can have in a variety of cellular contexts. 43 Traditionally, IDRs were considered to be passive segments in protein sequences that “linked” structured domains. However, it is now well established that IDRs actively participate in diverse functions mediated by proteins. For instance, disordered regions are frequently subjected to post-translational modifications (PTMs) that increase the functional states in which a protein can exist in the cell. 45,46 In addition, they expose short linear peptide motifs of about 3–10 amino acids that permit interaction with structured domains in other proteins. 47,48 These two features in isolation or in combination permit the interaction and recruitment of diverse proteins in space and time, thereby facilitating regulation of virtually all cellular processes. 47 The prevalence of IDRs in any genome (see, for example, the D2P2 database, 49 Box 1) in combination with their unique characteristics means that these regions extend the classical view of the structure–function paradigm and hence that of protein function. Thus, functional regions in proteins can either be structured or disordered, and these need to be considered as two fundamental classes of functional building blocks of proteins. 50 Figure 1 Structured domains and intrinsically disordered regions (IDRs) are two fundamental classes of functional building blocks of proteins. The synergy between disordered regions and structured domains increases the functional versatility of proteins. Adapted with permission from ref (50). Copyright 2012 American Association for the Advancement of Science. 1.5 The Need for Classification of Intrinsically Disordered Regions and Proteins IDRs and IDPs are prevalent in eukaryotic genomes. For instance, 44% of human protein-coding genes contain disordered segments of >30 amino acids in length 49 (similar data shown in Figure 2A). In the human genome, 6.4% of all protein-coding genes do not have any function annotation in their description in Ensembl 1 (Figure 2B). Further investigation using the D2P2 database of disorder in genomes 49 revealed that most of these genes with no function annotation encode at least some disorder (Figure 2B) and that genes with no annotation contain proportionally more IDRs (Figure 2C). Given the absence of structural constraints, IDRs tend to evolve more rapidly than protein domains that adopt defined structures. 51−56 As a result, identifying homologous regions is harder for IDRs and IDPs than it is for structured domains. This complicates the transfer of information about function between homologues and thus the prediction of function of IDRs and IDPs. Furthermore, much of protein annotation is based on information on sequence families and structured domains. However, less than one-half of all residues in the human proteome fall within such domains (Figure 3). Not only do most residues of human proteins fall outside domains, a large fraction of these residues are also disordered (Figure 3A and B, right bars). Moreover, although it is expected that SUPERFAMILY domains based on known protein structures have very little disorder (Figure 3A, left bar), Pfam domains based on sequence clustering do not contain much more (Figure 3B, left bar). These observations suggest that there is a large pool of protein segments that are not considered by conventional protein annotation methods, because the sequences of disordered regions are difficult to align, or because the methods do not explicitly consider disordered and nondomain regions of the protein sequence. Taken together, these considerations raise the need to devise a classification scheme specifically for disordered regions in proteins that may enhance the function prediction and annotation for this important class of protein segments. Figure 2 The number of protein-coding genes in the human genome with various amounts of disorder. Histograms of the numbers of human genes with annotation (A) and without annotation (B), grouped by the percentage of disordered residues. (C) A comparison of the fraction of annotated and unannotated human genes with different amounts of disorder. Residues in each protein are defined as disordered when there is a consensus between >75% of the predictors in the D2P2 database 49 at that position. The set of human genes was taken from Ensembl release 63, 1 and the representative protein coded for by the longest transcript was used in each case. The annotation was taken from the description field with “open reading frame”, “hypothetical”, “uncharacterized”, and “putative protein” treated as no annotation. Figure 3 The fraction of disordered residues located in domains in human protein-coding genes: (A) residues inside (left) and outside (right) of SCOP domains, 24 and (B) residues inside (left) and outside (right) of Pfam domains (only curated Pfam domains were considered, i.e., Pfam-A). 22 The SCOP domains in human proteins are defined by the SUPERFAMILY database. 10 Disordered residues were taken from the D2P2 database 49 (when there is a consensus between >75% of the disorder predictors). The set of human genes was taken from Ensembl release 63. 1 In this Review, we synthesize and provide an overview of the various classifications of intrinsically disordered regions and proteins that have been put forward in the literature since the start of systematic studies into their function some 15 years ago. We discuss approaches based on function, functional elements, structure, sequence, protein interactions, evolution, regulation, and biophysical properties (Table 1). Finally, we discuss resources that are currently available for gaining insight into IDR function (Table 2), we suggest areas where increased efforts are likely to advance our understanding of the functions of protein disorder, and we speculate how combinations of multiple existing classification schemes could achieve high quality function prediction for IDRs, which should ultimately lead to improved function coverage and a deeper understanding of protein function. Table 1 Classifications of Intrinsically Disordered Regions and Proteins basis for classification classes description examples function (33,39,57,58) •entropic chains IDRs carrying out functions that benefit directly from their conformational disorder, e.g., flexible linkers and spacers MAP2 projection domain, titin PEVK domain, RPA70, MDA5 •display sites flexibility of IDRs facilitates exposure of motifs and easy access for proteins that introduce and read PTMs p53, histone tails, p27, CREB kinase-inducible domain •chaperones their binding properties (many different partners, rapid association/disassociation, and folding upon binding) make IDPs suitable for chaperone functions hnRNP A1, GroEL, α-crystallin, Hsp33 •effectors folding upon binding mechanics allow effectors to modify the activity of their partner proteins p21, p27, calpastatin, WASP GTPase-binding domain •assemblers assembling IDRs have large binding interfaces that scaffold multiple binding partners and promote the formation of higher-order protein complexes ribosomal proteins L5, L7, L12, L20, Tcf 3/4, CREB transactivator domain, Axin •scavengers disordered scavengers store and neutralize small ligands chromogranin A, Pro-rich glycoproteins, caseins and other SCPPs functional features linear motifs 47,125 •structural modification sites of conformational alteration of a peptide backbone peptidylprolyl cis–trans isomerase Pin1 sites   •proteolytic cleavage sites of post-translational processing events or proteolytic cleavage scission sites Caspase-3/-7, separase, taspase1 scission sites   •PTM removal/addition specific binding sequences that recruit enzymes catalyzing PTM moiety addition or removal cyclin-dependent kinase phosphorylation site, SUMOylation site, N-glycosylation site   •complex promoting motifs that mediate protein–protein interactions important for complex formation; often associated with signal transduction proline-rich SH3-binding motif, cyclin box, pY SH2-binding motif, PDZ-binding motif, TRAF-binding motifs in MAVS   •docking motifs that increase the specificity and efficiency of modification events by providing an additional binding surface KEN box degron, MAPK docking sites   •targeting or trafficking signal sites that localize proteins within particular subcellular organelles or act to traffic proteins nuclear localization signal, clathrin box motif, endocytosis adaptor trafficking motifs molecular recognition features (MoRFs) 121 •alpha disordered motifs that form α-helices upon target binding p53 ∼ Mdm2, p53 ∼ RPA70, p53 ∼ S100B(ββ), RNase E ∼ enolase, inhibitor IA3 ∼ proteinase A   •beta disordered motifs that form β-strands upon target binding RNase E ∼ polynucleotide phosphorylase, Grim ∼ DIAP1, pVIc ∼ adenovirus 2 proteinase   •iota disordered motifs that form irregular secondary structure upon target binding p53 ∼ Cdk2-cyclin A, amphiphysin ∼ α-adaptin C   •complex disordered motifs that contain combinations of different types of secondary structure upon target binding amyloid β A4 ∼ X11, WASP ∼ Cdc42 intrinsically disordered domains (IDDs) 158,159   some protein domains identified using sequence-based approaches are fully or largely disordered WH2, RPEL, BH3, KID domains co-occurrence of protein domains with disordered regions 161,162   particular disordered regions frequently co-occur in the same sequence with specific protein domains   structure structural continuum 37   proteins function within a continuum of differently disordered conformations, extending from fully structured to completely disordered, with everything in between and no strict boundaries between the states   protein quartet 32,34,166 •intrinsic coil flexible regions of extended conformation with hardly any secondary structure; high net charge differentiates these from disordered globules ribosomal proteins L22, L27, 30S, S19, prothymosin α   •pre-molten globule disordered protein regions with residual secondary structure, often poised for folding upon binding events; lower net charge makes them more compact than coils Max, ribosomal proteins S12, S18, L23, L32, calsequestrin   •molten globule globally collapsed conformation with regions of fluctuating secondary structure nuclear coactivator binding domain of CREB binding protein   •folded structured proteins with a defined three-dimensional structure most enzymes, transmembrane domains, hemoglobin, actin sequence sequence–structural ensemble relationships 166,204 •polar tracts sequence stretches enriched in polar amino acids often form globules that are generally devoid of significant secondary structure preferences Asn- and Gly-rich sequences, Gln-rich linkers in transcription factors and RNA-binding proteins   •polyelectrolytes amino acid compositions biased toward charged residues of one type; strong polyelectrolytes (high net charge) form expanded coils Arg-rich protamines, Glu/Asp-rich prothymosin α   •polyampholytes sequences with roughly equal numbers of positive and negative charges; conformations of polyampholytes are governed by the linear distribution of oppositely charged residues, with segregation of opposite charges leading to globules, while well-mixed charged sequences adopt random-coil or globular conformations, depending on the total charge RNA chaperones, splicing factors, titin PEVK domain, yeast prion Sup35 prediction flavors 205 •V predicted best by the VL-2V predictor, for which the hydrophobic amino acids are the most influential attributes E. coli ribosomal proteins   •C VL-2C is the best predictor for flavor C, which has more histidine, methionine, and alanine residues than the other flavors poly- and oligosaccharide binding domains   •S flavor with less histidine than the others, best predicted by predictor VL-2S, which has a measure of sequence complexity as the most important attribute proteins that facilitate binding and interaction disorder–sequence complexity 206   IDPs from different functional classes show distinct disorder–sequence complexity distributions proteins with disordered linkers between structured domains populate compact and disordered DC regions overall degree of disorder 35,51,68,161,208,209 •fraction categorization of proteins based on the fraction of residues predicted to be disordered 0–10/10–30/30–100% disorder   •overall score overall disorder scores for the whole protein minimum average disorder score depending on the predictor   •continuous stretches presence or absence of continuous stretches of disordered residues typically >30 residues length of disordered regions 211 •>500 residues proteins that contain disordered regions of different lengths are enriched for different types of functions transcription   •300–500 residues   kinase and phosphatase functions   •<50 residues   (metal) ion binding, ion channels, GTPase regulatory activity position of disordered regions 211 •N-terminal proteins that contain disordered regions at different locations in the sequence are enriched for different types of functions DNA-binding, ion channel   •internal   transcription regulator, DNA-binding   •C-terminal   transcription repressor/activator, ion channel tandem repeats 217,218 •Q/N glutamine- and asparagine-rich proteins regions are both important for normal cellular function and prone to cause harmful aggregation huntingtin, Sup35p, Ure2p, Ccr4, Pop2   •S/R tandem repeats composed of arginine and serine residues are phosphorylated and disordered, and play a role in spliceosome assembly ASF/SF2, SRp75, SRSF1   •K/A/P tandem repeats composed of lysine, alanine, and proline function in binding nucleosome linker DNA histone H1   •F/G disordered domains with phenylalanine-glycine repeats influence NPC gating behavior nucleoporins   •P/T/S extensively glycosylated regions rich in proline, threonine, and serine residues are involved in mucus formation mucins     •others     protein interactions fuzzy complexes by topology 242 •polymorphic a form of static disorder, with alternative bound conformations serving distinct functions by having different effects on the binding partner β-catenin ∼ Tcf4, NLS ∼ importin-α, actin ∼ WH2 domain   •clamp complex formation through folding upon binding of two disordered protein segments, connected by a linker that remains disordered Ste5 ∼ Fus3, myosin VI ∼ actin filament, Oct-1 ∼ DNA   •flanking complex formation through folding upon binding of a central disordered protein segment, flanked by two regions that remain disordered SF1 splicing factor ∼ U2AF, proline-rich peptides ∼ SH3 domains, p27Kip1 ∼ cyclin-Cdk2   •random disordered regions that remain highly dynamic even in the bound state elastin self-assembly, Sic1 ∼ Cdc4 fuzzy complexes by mechanism 176,251 •conformational selection the fuzzy region facilitates the formation of the binding-competent form by shifting the conformational equilibrium Max ∼ DNA, MeCP2 ∼ DNA   •flexibility modulation the fuzzy region modulates the flexibility of the binding interface and changes binding entropy Ets-1 ∼ DNA, SSB ∼ DNA   •competitive binding the fuzzy region serves as an intramolecular competitive partner for the binding surface. HMGB1 ∼ DNA, RNase1 ∼ RNase inhibitor   •tethering the fuzzy region increases the local concentration of a weak-affinity binding domain near the target, or anchors it via transient interactions RPA ∼ DNA, UPF1 ∼ UPF2, PC4 ∼ VP16 binding plasticity 257 •static mono-/polyvalent complexes, chameleons, penetrators, huggers for examples, see Figure 12   •coiled-coil based intertwined strings, long cylindrical containers, connectors, armature, tweezers and forceps, grabbers, tentacles, pullers, stackers     •dynamic cloud contacts and protein interaction ensembles   evolution sequence conservation 54 •flexible regions that require the property of disorder for functionality regardless of the exact sequence signaling and regulatory proteins (Sky1, Bur1)   •constrained regions of conserved disorder that also have highly conserved amino acid sequences ribosomal proteins (Rpl5), protein chaperones (Hsp90)   •nonconserved no conservation of the disorder, nor of the underlying sequence; no clear functional hallmarks yeast Ty1 retrotransposon domains A and B conservation of amino acid composition 260 •HR IDRs with high residue conservation transcription regulation and DNA binding   •LRHT IDRs with low residue conservation but high conservation of the amino acid composition of the region ATPase and nuclease activities   •LRLT IDRs with neither conservation of sequence nor conservation of amino acid composition (metal) ion binding proteins lineage and species specificity 159 •prokaryotes species from different kingdoms of life seem to use disorder for different types of functions longer lasting interactions involved in complex formation   •eukaryotes and viruses   transient interactions in signaling and regulation evolutionary history and mechanism of repeat expansion 61 •Type I repeats that showed no function diversification after expansion titin PEVK domain, salivary proline-rich proteins   •Type II repeats that acquired diverse functions through mutation or differential location within the sequence RNA polymerase II (CTD)   •Type III repeats that gained new functions as a consequence of their expansion prion protein octarepeats regulation expression patterns 208 •constitutive IDPs encoded by constitutively highly expressed transcripts are almost entirely disordered and often ribosomal proteins ribosomal L proteins   •high IDP-encoding transcripts showing high expression levels in most tissues and little tissue specificity protease inhibitors, splicing factors, complex assemblers   •medium these IDP-encoding transcripts are expressed at medium levels, with some tissue-specificity DNA binding, transcription regulation   •tissue-specific IDP-encoding transcripts with highly tissue-specific expression cell organization regulators, complex disassemblers   •low or transient IDP-encoding transcripts that are present in undetectable amounts; more than one-half of analyzed IDPs variety of functions alternative splicing 304,305,309,312,313   regulation and evolutionary patterns of inclusion and exclusion of IDR-encoding exons can provide insights into whether the encoded IDR functions in protein regulation and interactions a tissue-specific region with a phosphosite in the TJP1 protein in mouse, a mammalian-specific region in the PTB1 splicing regulator degradation kinetics 315,316,318,320,321 •degradation accelerators IDRs that can influence and accelerate proteasomal degradation of the protein containing it     •others IDRs that have no influence on protein half-life or increase it, e.g., because of sequence compositions that impede proteasome processivity low complexity sequences such as glycine-alanine repeats and polyglutamine repeats post-translational processing and secretion 337,340   secreted proteins are depleted for IDPs, but structural disorder is important in, e.g., prohormones, the extracellular matrix, and biomineralization pre-pro-opiomelanocortin, elastic fiber proteins, SIBLINGs, mucins biophysical properties solubility 209   the sequence features of IDPs are generally associated with aqueous solubility, although some IDPs are thermostable, while others are not; this is likely modulated by sequence–structural ensemble relationships, such as the degree of compaction 4E-BP1, calpastatin, CREB, p21, p27, Sp1, stathmin, WASP phase transition 137,353   certain IDRs (such as those that contain specific low-complexity regions or interaction motifs) can undergo phase transitions like the formation of protein-based droplets or hydrogels multivalent SH3-binding motifs in phase separation, granule-like assemblies of RNA-binding proteins containing low-complexity IDRs, mucins biomineralization 117,341   structural disorder is common in proteins with roles in biomineralization, such as the formation of bone and teeth caseins, osteopontin, bone sialoprotein 2, dentin sialophosphoprotein Table 2 Current Methods for Function Prediction of Intrinsically Disordered Regions and Proteins basis for method description method Web site linear motifs annotation of well-characterized linear motifs, which can be mapped onto other protein sequences ELM 125 http://elm.eu.org/ MiniMotif 126 http://mnm.engr.uconn.edu/ identification of putative uncharacterized motifs in protein sequences SLiMPrints 372 http://bioware.ucd.ie/slimprints.html phylo-HMM 373 http://www.moseslab.csb.utoronto.ca/phylo_HMM/ DiliMot 374 http://dilimot.russelllab.org/ SLiMFinder 375 http://bioware.ucd.ie/slimfinder.html PTM sites resources of experimentally verified PTM sites, mostly phosphorylation Phospho.ELM 268 http://phospho.elm.eu.org/ PhosphoSite 376 http://www.phosphosite.org/ PHOSIDA 377 http://www.phosida.com/ identification and collection of peptide motifs that direct post-translational modifications ScanSite 380 http://scansite.mit.edu/ NetPhorest 381 http://netphorest.info/ NetworKIN 382 http://networkin.info/ PhosphoNET 383 http://www.phosphonet.ca/ molecular recognition features collection of verified sequence elements that undergo coupled folding and binding IDEAL 388 http://www.ideal.force.cs.is.nagoya-u.ac.jp/IDEAL/ prediction of sequences that undergo disorder-to-order transitions MoRFpred 385 http://biomine.ece.ualberta.ca/MoRFpred/ ANCHOR 386 http://anchor.enzim.hu/ intrinsically disordered domains annotation of disordered protein domains, which can be detected by sequence profiles Pfam 22 http://pfam.sanger.ac.uk/ other prediction of gene ontology functions using protein sequence features such as intrinsic disorder FFPred 391 http://bioinf.cs.ucl.ac.uk/psipred/ function annotation of experimentally verified disordered protein regions DisProt 203 http://www.disprot.org/ predictions of disordered regions combined with information on MoRFs, PTM sites, and domains D2P2  49 http://d2p2.pro/ 2 Function Dunker and co-workers 57 distinguished 28 separate functions for disordered regions, based on literature analysis of 150 proteins containing disordered regions of 30 residues or longer. These functionalities can be summarized as molecular recognition, molecular assembly, protein modification, and entropic chains. Further development of this scheme resulted in one comprising six different functional classes of disordered protein regions: entropic chains, display sites, chaperones, effectors, assemblers, and scavengers (Figure 4). 33,58 In another classification scheme, Gsponer and Babu classified IDR function into three broad functional categories: (i) facilitated regulation via diverse post-translational modifications, (ii) scaffolding and recruitment of different binding partners, and (iii) conformational variability and adaptability (Figure 5). 39 A single protein may consist of several disordered regions that belong to different functional classes. 59 The following section will address and exemplify the six functionalities of disordered regions. Figure 4 Functional classification scheme of IDRs. The function of disordered regions can stem directly from their highly flexible nature, when they fulfill entropic chain functions (such as linkers and spacers, indicated in dark-tone red), or from their ability to bind to partner molecules (proteins, other macromolecules, or small molecules). In the latter case, they bind either transiently as display sites of post-translational modifications or as chaperones (indicated in green), or they bind permanently as effectors, assemblers, or scavengers (indicated in dark-tone blue). More extensive descriptions and examples are found in the main text. Adapted with permission from ref (58). Copyright 2005 Elsevier. Figure 5 Functional classification of IDRs according to their interaction features. (A) The flexibility of IDRs facilitates access to enzymes that catalyze post-translational modifications and effectors that bind these PTMs. This permits combinatorial regulation and reuse of the same components in multiple biological processes. (B) The availability of molecular recognition features and linear motifs within the IDRs enables the fishing for (“fly casting”) and gathering of different partners. (C) Conformational variability enables a nearly perfect molding to fit the binding interfaces of very diverse interaction partners. Context-dependent folding of an IDR can activate signaling processes in one case or inhibit them in another, resulting in completely different outcomes. Adapted with permission from ref (39). Copyright 2009 Elsevier. 2.1 Entropic Chains Entropic chains carry out functions that benefit directly from their conformational disorder; that is, they function without ever becoming structured. Examples of entropic chains include flexible linkers, which allow movement of domains positioned on either ends of the linker relative to each other, and spacers that regulate the distances between domains. Evidence that flexibility is a functional characteristic that needs to be maintained came from studies on a family of flexible linkers in the 70 kDa subunit of replication protein A (RPA70), which display conserved dynamic behavior in the face of negligible sequence conservation. 60 The microtubule-associated protein 2 (MAP2) projection domain exemplifies spacer behavior as it repels molecules that approach microtubules, thereby providing spacing in the cytoskeleton. Another subcategory of entropic chains are entropic springs, such as those present in the titin protein, which contains repeat regions rich in PEVK amino acids that generate force upon overstretching to help restore muscle cells to their relaxed length. 61,62 2.2 Display Sites Post-translational modifications (PTMs) affect the stability, turnover, interaction potential, and localization of proteins within the cell. 63 These aspects of PTMs are particularly relevant for proteins involved in regulation and signaling, as are many IDPs. 35,37,39,64,65 The conformational flexibility of disordered protein regions as display sites provides advantages over structured regions. (i) Flexibility facilitates the deposition of PTMs by enabling transient but specific interaction with catalytic sites of modifying enzymes. 47,66 This is because, upon binding, a flexible, disordered region loses more conformational freedom (i.e., entropy), which reduces the overall free energy of binding, leading to weaker and more transient binding as compared to a folded protein region that interacts with equal strength (i.e., the same binding enthalpy, or, equal specificity). 28,30,37 (ii) The flexibility of IDRs also allows for easy access and recognition of the PTMs within the IDR by effector proteins that mediate downstream outcomes upon binding. 47,66 Indeed, experimental and computational approaches have shown that disordered regions are enriched for sites that can be phosphorylated, 45,46,67 and suggest that IDPs are likely to be substrates of a large number of kinases and other modifying enzymes as they are heavily post-translationally modified. 46,68,69 Furthermore, PTM sites are often located within short peptide motifs, modification of which influences the affinity for interaction with diverse binding partners (see section 3.1). 70,71 In turn, disordered protein regions are strongly enriched for these motifs, 47,72−74 underlining the importance of intrinsic disorder as PTM display sites. Well-characterized examples of IDPs in which PTMs are key to function and regulation include, among others, histones, p53, and the cyclin-dependent kinase regulator p27. 75−77 2.3 Chaperones Chaperones are proteins that assist RNA and protein molecules to reach their functionally folded states. 78,79 Disordered regions make up over one-half of the sequences of RNA chaperones and over one-third of the sequences of protein chaperones. 80,81 The versatility of disordered segments seems well suited for chaperone function, although mechanistic evidence is still scarce. 82 First, their capacity to structurally adapt to many different binding partners matches the need for chaperones to bind a wide range of proteins. Second, disordered segments enable fast macromolecular interactions. This is because the highly dynamic nature of IDRs prolongs the lifetime of the encounter complex of the binding event due to rapid sampling of many different conformations, thereby increasing the number of nonspecific interactions as compared to an encounter of a structured protein. In turn, this results in a higher probability to sample the specific conformation that results in the stable interaction complex and increases the association rate of the interaction. 83,84 The quick binding of misfolded proteins by disordered chaperones could, for example, prevent the formation of toxic aggregates by providing a solubilizing effect (see section 9.1). Finally, the binding thermodynamics of disordered regions are well suited for the cycles of repeated chaperone binding and release that enable substrate folding. It has been proposed that transient binding of disordered chaperone regions to misfolded substrates induces local folding of the disordered chaperone, and promotes unfolding of the substrate, thereby providing the substrate with a chance to refold correctly. 80 This reversible exchange of entropy represents a distinct type of chaperone function that relies on disordered regions and does not require ATP. Loss of flexibility of disordered regions upon substrate binding has been demonstrated for the chaperones GroEL 85 and α-crystallin. 86,87 This mechanism can even be switched on and off at need by regulated transitions between folded and disordered states, 88 as reported in the case of the redox-regulated chaperone Hsp33. 89,90 2.4 Effectors Another functional class of disordered regions is that of the effectors, which interact with other proteins and modify their activity. Upon binding their interaction partners, IDRs often undergo a disorder-to-order transition, also known as coupled folding and binding. 91,92 Examples of two effectors that fold upon binding are p21 and p27, which regulate different cyclin-dependent kinases (Cdk) that are responsible for the control of cell-cycle progression in mammals. 66 p21 and p27 exhibit functional diversity by achieving opposite effects on different Cdk–cyclin complexes, promoting the assembly and catalytic activity of some (e.g., Cdk4 paired with D-type cyclins), and inhibiting others (e.g., Cdk2 paired with A- and E-type cyclins). 66 Another effector IDP is calpastatin, which undergoes significant folding upon binding calpain, thereby achieving specific and reversible inhibition. 93 IDRs can also affect the activity of other parts within the same protein, either through competitive interactions or through allosteric modulation. The intrinsically disordered GTPase-binding domain (GBD) of the Wiskott–Aldrich syndrome protein (WASP) illustrates competitive binding that controls autoinhibition. 94 Binding of the GBD to the Cdc42 protein promotes the interaction of WASP with the actin cytoskeleton regulatory machinery. However, GDB adopts a different structure when it folds back on other parts of WASP to inhibit actin interaction. Indeed, autoinhibitory regions are generally enriched for intrinsic disorder and often have different structures in the inhibitory and functionally active states of the protein. 95 A striking example of allosteric coupling in a disordered protein was revealed between different binding sites in the adenovirus E1A oncoprotein. 96 Complexes of E1A with the TAZ2 domain of CREB-binding protein (CBP) and the retinoblastoma protein (pRb) can have either positive or negative cooperativity, depending on the available E1A interaction sites (i.e., binding of either pRb or CBP to E1A increases or decreases, respectively, the probability that the other one will also bind). These findings support earlier studies that suggest allosteric coupling does not always require a well-defined structural route to propagate through the protein, but can also be determined by the stabilities of individual conformations of the protein that change upon binding their interaction partners. 97−99 Such a mechanism could be one explanation for how the availability of different binding partners regulates the outcomes of multiple binding events involving disordered proteins in a cellular context. 96 2.5 Assemblers Disordered assemblers bring together multiple binding partners to promote the formation of higher-order protein complexes, 100,101 such as the ribosome (many ribosomal proteins are disordered 102 ), activated T-cell receptor complexes, 58 the RIP1/RIP3 necrosome, 103 and the transcription preinitiation complex. 104 The presence of different functional regions within the disordered segments, such as molecular recognition features (MoRFs) and short linear peptide motifs (SLiMs), enables binding and can bring together different partners (see sections 3.1 and 3.2). Indeed, larger complexes are assembled from proteins that tend to be more disordered, 105 and intrinsic disorder is a common feature of hubs in protein interaction networks. 106,107 The open structure of disordered assemblers is largely preserved upon scaffolding their partner proteins, resulting in a large binding interface that enables multiple proteins to be bound by a single IDR. 108,109 Furthermore, disordered regions largely avoid the steric hindrance that prevents the formation of comparably large complexes from structured proteins. Assembler function can be imagined in two ways. (i) The first is structural mortar, which helps to bring together proteins by stabilizing the complexes they form. A well-studied example of this behavior is the assembly of the ribosome, which relies on a sequence of cooperative binding steps of protein and RNA. 110 Although the initial stages of rRNA folding are probably driven by the RNA itself, 111 ribosomal proteins subsequently fold upon binding the rRNAs, 112,113 which induces structural changes in both the RNA and the protein, and guides the complex toward its native state. 110 (ii) The second is scaffolds that serve as backbones for the spatiotemporally regulated assembly of different signaling partners. An example of this mechanism is the Axin scaffold protein, which colocalizes β-catenin, casein kinase Iα, and glycogen synthetase kinase 3β by their binding to Axin’s long intrinsically disordered region, thereby effectively yielding a complex of structured domains with flexible linkers. 114 The assembly of all four proteins accelerates interactions between them by raising their local concentrations and leads to the efficient phosphorylation and subsequent destruction of β-catenin. Scaffolding regions have one of the highest degrees of disorder of all functional categories. 109,115 2.6 Scavengers The final distinct functional class of IDRs and IDPs are scavengers, which store and neutralize small ligands. Chromogranin A, one of the earliest examples of an IDP, functions as a scavenger by storing ATP and adrenaline in the medulla of the adrenal gland. 116 NMR studies showed that chromogranin is a random coil in both the isolated form and in its cellular environment in the intact adrenal gland. 116 Caseins and other calcium-binding phosphoproteins (SCPPs) are highly disordered proteins that solubilize clusters of calcium phosphate in milk and other biofluids (see section 9.3). 117 Finally, salivary proline-rich glycoproteins are scavenger IDPs that bind tannin molecules in the digestive tract. 33 3 Functional Features Different types of functional regions in intrinsically disordered proteins have been uncovered by investigations aimed both directly at increasing the understanding of IDRs and indirectly by linking previously studied functionality of proteins to disordered regions. First, the majority of linear motifs (such as the SH2 domain interaction motif) have been found as enriched in IDRs. 48,72,118 Second, the development of disorder prediction methods (Box 3) has led to the identification of segments that promote disorder-to-order transitions called molecular recognition features (MoRFs), 119−123 which have been verified using known crystal structures. Third, some interaction domains identified using crystallography, by sequence analysis, and by other techniques, turn out to be intrinsically disordered in solution (e.g., the BH3 domain 124 ). The following section discusses these three interaction features separately and points out the underlying connections between them. 3.1 Linear Motifs A common functional module within IDRs is the linear motif, 47,48,72 also known as LMs, short linear motifs (SLiMs), 125 or MiniMotifs. 126 By regulating low-affinity interactions, these short sequence motifs (annotated instances are usually 3–10 amino acids long 48 ) can target proteins to a particular subcellular location, recruit enzymes that alter the chemical state of the motif by post-translational modifications (PTMs), control the stability of a protein, and promote recruitment of binding factors to facilitate complex formation. 47,48 Linear motifs, helped by the flexible nature of the disordered regions that surround them, 71 primarily bind onto the surfaces of globular domains, 127,128 and their compact binding surface promotes them to occur multiple times within one protein. 47,48 Moreover, the short nature of many linear motifs means they have a high propensity to convergently evolve and emerge in unrelated proteins. 47,48 A consequence of these properties is that pathogenic viruses and bacteria have evolved to mimic these linear motifs, allowing them to manipulate regulation of cellular processes. 129,130 Linear motifs can be broadly divided into two major families: those that act as modification sites and those that act as ligands, with each having numerous subgroups (Figure 6). 131 The first major family, the enzyme binding or modification motifs, can be divided into three groups. (i) The first is post-translational processing events or proteolytic cleavage. A well-known example is the motif recognized by Caspase-3 and -7, which has an [ED]xxD[AGS] consensus sequence. Caspases are a family of proteases that promote apoptosis and inflammation by cleaving such motifs in their substrate proteins. 132 Hundreds of proteins have convergently evolved the Caspase-3/-7 motif, and thereby have come under the regulation of the apoptotic pathway. 133 (ii) The second is PTM moiety removal and addition. Many enzymes that catalyze post-translational modifications recognize a specific binding sequence on the substrate. For example, the cyclin-dependent kinase recognition motif [ST]Px[KR] is present in many mitotic proteins, and its phosphorylation is key for regulating cell cycle progression. 134 (iii) The third is structural modifications. This group of motifs is involved in the catalyzed conformational alteration of a peptide backbone. The classic example is the peptidylprolyl cis–trans isomerase (PPIase) Pin1, which binds [ST]P motifs in a phosphorylation dependent manner to catalyze the cis–trans isomerization of the proline peptide bond. This modification can regulate the recognition of phosphorylated [ST]P sites by phosphatases. 135 Figure 6 Functional classification of linear motifs. Linear motifs can be divided into two major families, which each have three further subgroups. The modification class motifs all act as recognition sites for enzyme active sites, whereas the ligand class motifs are always recognized by the binding surface of a protein partner. More detailed classification beyond the graph shown here is possible. For example, an important subgroup of docking motifs are the degrons, which regulate protein stability by recruiting members of the ubiquitin–proteasome system. In the regular expressions, x corresponds to any amino acid, while other letters represent single letter codes of amino acids; letters within square brackets mean either residue is allowed in that position. The second major family of motifs comprises ligand motifs, which can also be divided into three main groups (Figure 6). (i) Complex promoting motifs are the most well-known class of motifs and include the phosphorylated tyrosine motif recognized by SH2 (Src homology 2) domains, the C-terminal motifs that bind PDZ domains, and the proline-rich PxxP motifs that interact with SH3 (Src homology 3) domains. 136 These motifs often function in protein scaffolding, and their multivalency (tendency to occur multiple times in one sequence) can increase the avidity of interactions and promote phase transition (see section 9.2). 137 (ii) Docking motifs increase the specificity and efficiency of modification events (e.g., addition or removal of PTMs, see above) by providing additional binding surface. These docking motifs are distinct from the modification sites, but are usually in the same protein. Examples are the KEN box and D box degrons, which act as recognition surfaces for ubiquitin ligases that ubiquitinate the protein on a different position, leading to degradation of the protein by the 26S proteasome. 138,139 The KEN box motif occurs in several key mitotic kinases to ensure their degradation or deactivation at mitotic exit. 139 In some cases, the docking site is present in a protein different from that which contains the modification site, as exemplified by the F box motif. Another part of F box proteins recognizes post-translationally modified degradation motifs of substrates, while the F box itself docks the Skp1 components of SCF (Skp, Cullin, F box) E3 ligase complexes. 140 (iii) Targeting motifs can localize proteins toward subcellular organelles. For example, importin proteins involved in nuclear transport recognize the nuclear localization signal (NLS), usually a motif containing a short cluster of lysines and arginines, and translocate NLS-containing proteins into the nucleus. 141 Targeting motifs can also act to traffic proteins, as in the case of endocytic motifs. These are recognized by adaptor proteins at different stages of endocytosis to ensure that cargo proteins are packaged into vesicles and trafficked to the right location. 142,143 An important feature of linear motifs is their propensity to act as molecular switches. This is for two major reasons. (i) Linear motif-mediated interactions are generally low affinity due to the limited binding surface. This means that large, bulky post-translational modifications have a big impact on their binding properties. 71 (ii) Their small footprint (i.e., size) allows motifs to occur multiple times in the same protein, thereby promoting high avidity interactions and the recruitment of multiple factors (e.g., the LAT complex in T-cell receptor signaling 144 ). 99 This also means two different motifs can overlap, resulting in mutually exclusive binding of interaction partners. 73 The ability of a motif to rapidly switch between binding partners and create multivalent complexes is crucial for the creation of dynamic signaling networks. 71 3.2 Molecular Recognition Features Disordered segments can also contain another type of peptide motif (10–70 amino acids) that promotes specific protein–protein interactions. These functional elements are called preformed structural elements (PSEs), 119 molecular recognition features (MoRFs) or elements (MoREs), 120−122 or prestructured motifs (PreSMos). 123 Importantly, MoRFs undergo disorder-to-order transitions upon binding their interaction partners (i.e., folding upon binding), 38,121,123 and often the unbound form of these preformed elements is biased toward the conformation that they adopt in the complex. 119 Preformed structural elements and MoRFs may serve as initial contact points for interaction events, which have different kinetic and thermodynamic properties than interactions between structured protein regions as discussed before. Binding of preformed elements is one version of conformational selection (see section 6), suggested long ago for interactions with flexible ligands. 145 At the other extreme is induced folding, in which structure formation and binding occur concomitantly after the formation of the initial encounter complex. Given the complexity of many complexes involving intrinsically disordered regions, interactions involving both conformational selection of preformed elements and induced folding likely occur. 92,146 MoRFs occurring in the Protein Data Bank 147 can be classified into subtypes according to the structures they adopt in the bound state: α-MoRFs, β-MoRFs, and ι-MoRFs (Figure 7A–C), 121 which form α-helices, β-strands, and irregular (but rigid) secondary structure when bound, respectively. MoRFs that contain combinations of different types of secondary structure are called complex (Figure 7D). 121 The p53 protein contains multiple MoRFs that are disordered in the absence of their interactors (Figure 7E). 120,121 The first p53 MoRF is located near the N-terminus and undergoes a transition from a disordered to an α-helical state upon interaction with the Mdm2 protein. In fact, this region of p53 exemplifies the high potential of IDRs for multiple partner binding as it is known to bind more than 40 different partners. However, for most of these complexes, the 3D structures are not determined, and therefore the MoRF type is not always known. The region between p53 residues 40 and 60 features an α-MoRF that functions as a secondary binding site for Mdm2 as well as a primary binding site for RPA70. 148 In the absence of any binding partner, this region shows evidence of minimal helical secondary structure, 149 whereas when bound to either Mdm2 150 or RPA70, 151 a stronger helical structure is observed. The C-terminal region of p53 also contains a MoRF that interacts with multiple partners, giving rise to different bound structures. For example, the S100B(ββ) protein induces a helical structure, while interaction with the Cdk2–cyclin A complex leads to an irregular ι-MoRF. An example of the role of MoRFs in scaffolding proteins is RNase E, which assembles the RNA degradosome. 152 The flexible C-terminal end of RNase E contains several recognition motifs that are central to its scaffolding function and serve as binding sites for other members of the degradosome. 153 For example, an α-MoRF interacts with enolase, 154 and a β-MoRF binds polynucleotide phosphorylase. 155 The recognition features are connected by disordered segments that accommodate assembly of the multiprotein complex by providing the required space and flexibility. Lee and co-workers 123 have annotated the secondary structure propensities of many other regions that display transient structural elements and undergo disorder-to-order transitions, all of which have been experimentally confirmed by NMR spectroscopy. Figure 7 Classification of molecular recognition features (MoRFs) based on the secondary structure of the bound state. MoRFs (red ribbons) undergo disorder-to-order transition upon binding their partners (blue surfaces). (A) α-MoRF. BH3 domain of BAD (MoRF) bound to bcl-xl (partner) (PDB ID: 1G5J). (B) β-MoRF. Inhibitor of apoptosis protein DIAP1 (partner) bound to N-terminus of cell death protein GRIM (MoRF) (PDB ID: 1JD5). (C) ι-MoRF. AP-2 (partner) bound to the recognition motif of amphiphysin (MoRF) (PDB ID: 1KY7). (D) Complex-MoRF. Phosphotyrosine-binding domain (PTB) of the X11 protein (partner) bound to amyloid β A4 protein (MoRF) (PDB ID: 1X11). Note that the PTB domain of X11 actually binds unphosphorylated peptides and is a PTB by sequence similarity. Panels A–D reprinted with permission from ref (122). Copyright 2007 American Chemical Society. (E) Promiscuity of disorder-controlled interactions illustrated by the p53 interaction network. A structure versus disorder prediction on the p53 amino acid sequence is shown in the center of the figure (up = disorder, down = order) along with the structures of various regions of p53 bound to 14 different partners. The predictions for a central region of structure, and the disordered amino and carbonyl termini have been confirmed experimentally for p53. The various regions of p53 are color coded to show their structures in the complex and to map the binding segments to the amino acid sequence. Starting with the p53–DNA complex (top, left, magenta protein, blue DNA), and moving in a clockwise direction, the Protein Data Bank 147 IDs and partner names are given as follows for the 14 complexes: (1tsr – DNA), (1gzh – 53BP1), (1q2d – gcn5), (3sak – p53 (tetramerization domain)), (1xqh – set9), (1h26 – cyclin A), (1ma3 – sirtuin), (1jsp – CBP bromo domain), (1dt7 – s100ββ), (2h1l – sv40 Large T antigen), (1ycs – 53BP2), (2gs0 – PH), (1ycr – MDM2), and (2b3g – RPA70). Reprinted with permission from ref (40). Copyright 2010 Elsevier. Sequence context can play an active role in modulating the degree of structural preorganization of a MoRF. An example pertains to the study of DNA binding motifs in the basic regions (bRs) of basic region leucine zipper transcription factors. 156 The bRs are 28–30 residue long regions predicted to be highly disordered and include a strongly conserved 10-residue DNA binding motif (DBM). The α-helicity (i.e., preference for α-helical conformation) of the DBM in the unbound form is modulated by the sequence of the N-terminal segment that is directly in cis to the DBM. 156 For example, the N-terminal sequence contexts of Gcn4 and Cys3 DBMs contribute to a higher level of helicity of the DBM than the same region in c-Fos and Fra1 (whose DBMs have a low helicity). Essentially, the N-terminal sequence contexts are helix caps, and these can be used in different ways to ensure different levels of structural preorganization within an α-MoRF, thereby suggesting that investigating sequence contexts can provide useful clues when classifying MoRFs and linear motifs. 157 3.3 Intrinsically Disordered Domains Most protein domains that are identified using sequence-based approaches are structured, but some can be fully or largely disordered 158 or contain conserved disordered regions, 159 known as intrinsically disordered domains (IDDs). For instance, about 14% of Pfam domains have more than 50% of their residues in predicted disordered regions. Many well-known domains, such as the kinase-inhibitory domain (KID) of Cdk inhibitors (e.g., p27 66 ) and the Wiskott–Aldrich syndrome protein (WASP)-homology domain 2 (WH2) of actin-binding proteins, 158 have been shown experimentally to be fully disordered in isolation and solution. Protein domains with conserved disordered regions have a variety of functions, but are most commonly involved in DNA, RNA, and protein binding. 159 Furthermore, domains that were gained during evolution by the extension of existing exons contain the highest degree of disordered regions. 160 This suggests that exonization of previously noncoding regions could be an important mechanism for the addition of disordered segments to proteins. Interestingly, it has also been observed that particular disordered regions frequently co-occur in the same sequence with specific protein domains. 161,162 Some domain families appear only to require the presence of disorder in their neighborhood for functioning, while others seem to rely on the occurrence of disordered regions in specific locations relative to the start or end of the protein domain. 161 For example, particular combinations of domains, involved mainly in regulatory, binding, receptor, and ion-channel roles, only occur with a disordered region inserted between them, while others only occur without a disordered domain between them. These observations imply that short disordered regions in the vicinity of protein domains complement the function of a structured domain, and in some cases may comprise separate functional modules in their own right. Thus, the co-occurrence of IDRs and structured domains in the same protein might be useful to gain insight into unannotated disordered regions. 3.4 Continuum of Functional Features A measure that is often used to distinguish the different types of disordered binding modules is length; however, this is likely to stem primarily from the different methodology used for their detection. Protein domain detection relies on hidden Markov models, 22 which is not the best approach for identifying short sequences, and therefore domain annotation tends to focus on larger sequence regions. In contrast, linear motifs in the ELM database are biased toward short binding modules (∼3–10 amino acids 48,125 ) as these are more straightforward to annotate. Finally, the tendency of MoRFs and preformed elements to undergo disorder-to-order transitions and the statistics used for their detection means that these features tend to be slightly longer than annotated linear motifs. Thus, although there are differences in the definitions of linear motifs and MoRFs, they share many common features 72,163 including a tendency to undergo disorder-to-order transition (all MoRFs by definition and ∼60% of LMs 48 ), an enrichment in IDRs (MoRFs by definition and ∼80% of LMs are in IDRs 48,72 ), and a tendency to promote complex formation. 48,100,122 Intrinsically disordered domains (IDDs) can also have significant overlap with MoRFs and linear motifs. For example, the WH2 domain is considered an IDD 158 and is also defined as a motif in the ELM database. 125 One feature that is probably more common in IDDs is that some are not only capable of binding to well-folded, structured domains (a mechanism shared with motifs and MoRFs), but can also bind each other in a process of mutually induced folding. For example, the nuclear coactivator binding domain (NCBD) of CREB-binding protein (CBP) and the activator for thyroid hormone and retinoid receptors (ACTR) domain of p160 are both disordered on their own but upon interaction form a complex by mutual synergistic folding. 164 The overlap between linear motifs and MoRFs especially, but also IDDs, suggests that these functional features are different states in the same continuum of binding mechanisms involving disordered regions. 4 Structure Intrinsically disordered regions and proteins show a wide variety of structural subtypes. These different types of disorder can be characterized using an array of experimental techniques (Box 2), and several resources collect computationally identified and experimentally verified disordered regions (Box 1). The following section discusses classification schemes that are based on structural features of disordered proteins. 4.1 Structural Continuum Proteins have been proposed to function within a conformational continuum, ranging from fully structured to completely disordered. 37 The spectrum covers tightly folded domains that display either no disorder or only local disorder in loops or tails, multidomain proteins linked by disordered regions, compact molten globules containing extensive secondary structure, collapsed globules formed by polar sequence tracts, unfolded states that transiently populate local elements of secondary structure, and highly extended states that resemble statistical coils (Figure 8). In this model, there are no boundaries between the described states and native proteins could appear anywhere within the continuous landscape. IDRs are highly dynamic and fluctuate rapidly over an ensemble of heterogeneous conformations (see section 4.2). 165 Thus, an IDR may fluctuate stochastically between several different states, transiently sampling coil-like states, localized secondary structure, and more compact globular states. Transient localized elements of secondary structure (most often helices) are common in amphipathic regions of the sequence and potentially play a role in binding processes. 92 The structural characteristics and populations of the individual states in the conformational ensemble and the degree of compaction of the polypeptide chain are determined by the nature of the amino acids and their distribution in the IDR sequence (see section 5.1). 166−168 For example, low and high average charges typically lead to disordered globules and swollen coils, respectively. 166,167 Figure 8 Schematic representation of the continuum model of protein structure. The color gradient represents a continuum of conformational states ranging from highly dynamic, expanded conformational ensembles (red) to compact, dynamically restricted, fully folded globular states (blue). Dynamically disordered states are represented by heavy lines, stably folded structures as cartoons. A characteristic of IDPs is that they rapidly interconvert between multiple states in the dynamic conformational ensemble. In the continuum model, the proteome would populate the entire spectrum of dynamics, disorder, and folded structure depicted. 4.2 Conformational Ensembles Disordered regions in the native unbound state exist as dynamic ensembles of rapidly interconverting conformations, 165,169,170 which can be described by relatively flat energy landscapes. 99,171,172 Conditions, post-translational modifications, and binding events (see section 6) change the relative free energies of individual conformations as well as the energy differences between conformations. 99,173−176 As a result, the populations of individual conformations within the ensemble change under different conditions. These individual states are often important for function. Thus, the dynamic nature of IDPs is best modeled by statistical approaches that describe the probabilities of individual conformations in the ensemble, 172,177,178 and is best measured by experimental techniques that prevent conformational averaging (Box 2). 179−182 4.3 Protein Quartet The protein quartet model proposes that protein function can arise from four types of conformational states and the transitions between them: random coil, pre-molten globule, molten globule, and folded (Figure 9). 32,34 In this model, unbound disordered regions could fall into all categories except for “folded”. Proteins in the pre-molten globule state are less compact than molten globules, but still show some residual secondary structure. In contrast, proteins in the random coil state show little or no secondary structure. The pre-molten globule state has a high propensity to participate in folding upon binding events, 183 which would make this structural state suitable for disordered regions acting as effectors and scaffolds. On the basis of the notion that IDPs and IDRs possess great structural and sequence heterogeneity, proteins may also be considered as modular assemblies of foldons (independently foldable regions), inducible foldons (foldable regions that can gain structure as a result of interaction with specific partners), semifoldons (regions that are always partially folded), and nonfoldons (regions that never fold). 184 The four distinct conformational states of the quartet model are a subset of the continuous spectrum of differently disordered states (see section 4.1), 37 which extends from fully ordered to completely structure-less proteins, with everything in between. A single description of structure (such as the quartet states) may be suitable for the conformational average of a protein, while a structural continuum is a better description of an ensemble of different conformations (see section 4.2). Figure 9 The protein quartet model of protein conformational states. In accordance with this model, protein function arises from four types of conformations of the polypeptide chain (ordered forms, molten globules, pre-molten globules, and random coils) and transitions between any of these states. FG nucleoporins are an example of the functional significance that different disordered conformations can have. The porins make up the central part of nuclear pore complexes (NPCs) and regulate nucleocytoplasmic transport. 185 Intrinsically disordered regions with multiple phenylalanine-glycine (FG) motifs make up large parts of the NPC gates. FG regions adopt various disordered conformations with specific functions. 186 Some regions have the low charge characteristics of collapsed coils, while others are characterized by a high degree of charged amino acids, giving rise to relaxed and extended coil structures. Molecular dynamics simulations have shown that extended coils are more dynamic than collapsed coils, suggesting distinct functionalities for the two structural groups. Interestingly, some FG nucleoporins feature both types of disorder along their polypeptide chain. Combinations of disorder subtypes in nucleoporin domains are likely to contribute to NPC gating behavior by creating “traffic” zones with distinct physicochemical properties that influence the dynamics of substrate translocation through the nuclear envelope. 186−189 4.4 Supertertiary Structure IDRs allow for complex regulatory phenomena, as witnessed in the case of multidomain proteins in signaling and regulation. 39,66,70,71,136,190 Because of the presence of structural disorder, functional domains, and short motifs, multidomain proteins are characterized by a dynamic ensemble of tertiary conformations. Some conformations are dominated by intramolecular domain–domain and domain–motif interactions and are closed and structured in nature, while other conformations are more open and disordered. This state of conformational variability within a protein lies between the tertiary structure of proteins and the quaternary structure of multiprotein assemblies, and has been termed supertertiary structure. 191 Complex regulatory function stems from transitions in the ensemble of these structures, as demonstrated by several well-characterized proteins, such as the Wiskott–Aldrich syndrome protein (WASP, see section 2.4), 94 the Src-family tyrosine kinase Hck, 192 and the E3 ubiquitin ligase Smurf2. 193 5 Sequence The sequences of IDPs and IDRs have distinct compositional biases. They are enriched in charged and polar amino acids and depleted in bulky hydrophobic groups. 31,44,194,195 These biases have led to the inference that disorder is a natural consequence of weakening the hydrophobic effects that drive folding of polypeptides into compact tertiary structures. Although disordered regions generally lack the ability to fold independently due to these biases in amino acid composition, distinct subsets of sequences that have different structural and functional characteristics can be identified within IDRs. The special sequence properties of disordered regions are the basis for many disorder prediction methods (Box 3). The following section covers sequence-based classification schemes of IDRs. 5.1 Sequence–Structural Ensemble Relationships Systematic efforts combining experiments and computations have addressed the relationship between information encoded in amino acid sequences and the ensemble of conformations (see section 4.2) these sequences can sample in different conditions. These studies have focused on three major archetype sequences: polar tracts, polyelectrolytes, and polyampholytes. 196 Polar tracts are sequence stretches enriched in polar amino acids such as glutamine, asparagine, serine, glycine, and proline, and deficient in charged as well as hydrophobic residues. These polar tracts (especially glutamine, asparagine, and glycine-rich sequences) form globules that are generally devoid of significant secondary structure preferences 170,197−199 and can be as compact as well-folded domains. 196 Collapse of polar tracts arises from the preference for self-solvation over solvation by the aqueous milieu. In this case, disorder derives from a lack of specificity for a single compact conformation as instead heterogeneous ensembles of conformations with similar stabilities and compactness are formed. The free energy landscape of polar tracts is weakly funneled and resembles an “egg carton”. 200 Interestingly, the drive to collapse, which implies a drive to minimize the interface between the IDR and the surrounding solvent, can also give rise to the significant aggregation and solubility problems 201 as is the case with several glutamine, asparagine, and glycine-rich sequences that are implicated in amyloid formation and phase separation. 202 Another end of the compositional spectrum are polyelectrolytes. Their amino acid compositions are biased toward charged residues of one type such as the arginine-rich protamines 166 or the Glu/Asp-rich prothymosin α. 167 Experiments and simulations have shown that the tendency of polypeptide backbones to form ensembles of collapsed structures can be reversed by increasing the net charge per residue past a certain threshold (Figure 10A). The transition between globules and expanded coils is sharp, suggesting that small changes to the net charge per residue through post-translational modifications such as serine or threonine phosphorylation or lysine acetylation could cause reversible globule-to-coil transitions. These transitions might control the accessibility of SLiMs and MoRFs or even modulate the conformations of these elements. Figure 10 Original 166 and modified 204 diagram-of-states to classify predicted conformational properties of IDPs (and IDRs modeled as IDPs). (A) The original diagram predicts that sequences with a net charge per residue above 0.25 will be swollen coils. The three axes denote the fraction of positively charged residues, f +, the fraction of negatively charged residues, f –, and the hydropathy. All three parameters are calculated from the amino acid composition. Green dots correspond to 364 curated disordered sequences extracted from the DisProt database. 203 These sequences have hydropathy values that designate them as being disordered; that is, they lie in the bottom portion of the pyramid by definition. Additional filters were used for chain length (more than 30 residues) and the fraction of proline residues (f pro < 0.3). 97% of sequences used in this annotation have a net charge per residue of less than 0.26 and are thus predicted to be globule formers. 204 Adapted from ref (166). Copyright 2010 National Academy of Sciences of the United States of America. (B) Modified diagram-of-states from panel (A) with a focus only on the bottom portion of the pyramid (i.e., stipulating that the hydropathy is low enough to be ignored). 204 The polyampholytic contribution expands the space encompassed by nonglobule-formers by subdividing the disordered globules space in panel (A) into three distinct regions of which sequences in regions 2 and 3 actually may not form globules. In these polyampholytic regions, one has to account for the total charge, in terms of the fraction of charged residues (FCR), as well as the net charge per residue (NCPR) as opposed to NCPR alone. Conformations in regions 2 and 3 are expected to be random-coil-like if oppositely charged residues are well mixed in the linear sequence. Otherwise, one can expect compact or semicompact conformations. The classification scheme uses only the amino acid sequence as input. Reprinted with permission from ref (204). Copyright 2013 National Academy of Sciences of the United States of America. The impact of the net charge per residue on the conformational properties of IDRs can be summarized in a diagram-of-states (Figure 10A), 166 which generalizes the original charge-hydropathy plot. 31 The diagram classifies IDRs on the basis of their amino acid compositions. Annotation using curated disordered sequences from the DisProt database 203 (Box 1) initially suggests that a vast majority (∼95%) of IDPs have amino acid compositions that predispose them to be globule formers (Figure 10A). 204 However, most of these predicted globule formers are actually polyampholytes in that they are enriched in charged residues but have roughly equal numbers of positive and negative charges. 204 Although such sequences are classified as globule formers on the basis of their low net charge per residue, in reality the conformational properties of polyampholytes are governed by the linear sequence distribution of oppositely charged residues. If the oppositely charged residues are segregated in the linear sequence, then electrostatic attractions between oppositely charged blocks cause chain collapse and result in hairpin or globular conformations. In sequences with well-mixed oppositely charged residues, the effects of electrostatic repulsions and attractions counterbalance. These mixed sequences adopt random-coil or globular conformations, depending on the total charge (in terms of the fraction of charged residues) (Figure 10B). Many IDPs are strong polyampholytes with well-mixed linear patterns of oppositely charged residues. 204 Thus, IDPs are actually enriched in different classes of random coils that form swollen, loosely packed conformations (Figure 10B). Such random-coil sequences are likely to help improve the solubility profiles of connected structured domains (see section 9.1) and to promote the flexibility that is required for functions such as entropic tethers, which promote high local concentrations of connected protein parts, or entropic bristles, which occupy large volumes by rapid exploration of conformations. These biophysical principles of sequence–structural ensemble relationships enable the use of de novo sequence design as a tool for modulating these properties and assessing their impact on functions associated with IDPs and IDRs. 5.2 Prediction Flavors Methods for predicting disordered regions have generally been successful (Box 3), but their prediction accuracies vary for different types of disordered regions. 205 Some predictors accurately predict certain disordered regions but have lower accuracy predicting others, whereas other predictors give opposite results. Vucetic and co-workers 205 classified protein disorder into three different “flavors” based on competition between disorder predictors. These V, C, and S disorder flavors (corresponding to the names of the disorder predictors that best predict them: VL-2V, VL-2C, and VL-2S) show differences in sequence composition, and combinations of flavors could be associated with different protein functions. For example, disordered regions that bind to other proteins are enriched for flavor S, while disordered ribosomal proteins predominantly belong to flavor V. Flavor C gave strong disorder predictions for sugar binding domains. 5.3 Disorder–Sequence Complexity Space The relationship between sequence complexity and disorder propensity provides further insight into the structural and functional variations of IDRs. 206 Different functional classes of proteins often show a different disorder–sequence complexity (DC) space distribution. A frequently observed DC-distribution is composed of a compact structured part and a section extending out into the low-complexity and high-disorder space before looping back into the structured region. This pattern describes a disordered linker region between structured domains. An example is the bacterial translation initiation factor, which contains a sequence that locates to the low-complexity, high-disorder region of DC space. This loop connects the N- and C-terminal domains, which are high-structure and high-complexity. 206,207 Functionally related proteins have similar disorder–sequence complexity distributions, suggesting that these distributions might be useful for predicting the function of a disordered region. 5.4 Overall Degree of Disorder Large-scale studies into IDP function often group the proteins on the basis of some measure of disorder. For example, protein sequences have been categorized on the basis of the overall degree of disorder (i.e., the fraction of residues that is shown or predicted to be disordered), 68,208 resulting in groups of structured proteins (0–10% disorder), moderately disordered proteins (10–30% disorder), and highly disordered proteins (30–100% disorder). For 24% of human protein-coding genes, at least 30% of residues are predicted to be disordered (Figure 2A). Other studies classified proteins on the basis of an overall score of disorder for the whole protein, 209 and the presence or absence of continuous stretches of disordered residues with a specific length. 35,51,161,208 Largely structured proteins are enriched for metabolic functions, while highly disordered proteins function predominantly in regulation. Hence, classification of disordered proteins based on the level of disorder provides clues about what types of functions are likely. 5.5 Length of Disordered Regions The length of IDRs in human follows a power law distribution: there are large numbers of short disordered regions and increasingly smaller numbers of longer ones. 210 Other eukaryotic and prokaryotic proteomes show similar disorder length profiles. 44% of human protein-coding genes contain substantial disordered segments of >30 amino acids in length 49 (similar data shown in Figure 2A). Short IDRs may function as linkers and contain individual linear motifs or MoRFs, whereas longer disordered regions might be entropic chains or contain combinations of motifs or domains functioning in recognition. Very long disordered regions (more than 500 residues) are typically over-represented in transcription-related functions, 211 whereas proteins containing IDRs of 300–500 residues in length are enriched for kinase and phosphatase functions. Shorter IDRs (less than 50 residues) tend to be linked to metal ion binding, ion channels, and GTPase regulatory functions. Thus, the length of a disordered region can also provide a useful indication about the functional nature of the protein containing it. 5.6 Position of Disordered Regions Almost all human proteins have some disordered residues within their terminal regions. 59 For example, 97% of proteins have predicted disorder in the first or last five residues. 161 Disordered N-terminal tails are common in DNA-binding proteins, and have been shown to contribute to efficient DNA scanning. 212 Furthermore, proteins that are relatively rich in disordered residues at the C-terminus are often associated with transcription factor repressor and activator activities as compared to proteins rich in internal or N-terminal disorder. 211 Membrane proteins, depending on their topology of insertion, also contain disordered regions in the N- or C-terminus, but their sequence composition is different as compared to disordered regions in cytosolic proteins. 213 Ion channel proteins are enriched for disordered residues at the N-terminus, and the same is true to a lesser extent for C-terminal disorder. 211 These terminal disordered regions are often functionally relevant, as illustrated by their role in the inactivation of voltage-gated potassium channels. 214 Similarly, many G-protein-coupled receptors (GPCRs) have large disordered regions in their C-terminus, and often in the intracellular loops. 215 Several of them harbor peptide motifs that link ligand binding in the transmembrane region of the receptor to intracellular effectors, or contain PTM sites or linear motifs that govern their stability. 216 Finally, proteins that are relatively rich in internal disordered regions are weakly enriched for transcription regulator and DNA binding activity. 211 Thus, the relative position of a disordered region in a sequence provides clues about the function of the protein containing it. 5.7 Tandem Repeats Short tandem repeats are common in IDRs and IDPs. 61,217−220 For instance, as much as 96% of polyglutamate and polyserine stretches lie within disordered regions. 219 Similarly, large fractions were found for proline, glycine, glutamine, lysine, aspartate, arginine, histidine, and threonine repeats. In contrast, polyleucine stretches occur predominantly within structured regions. These observations agree with the compositional bias of disordered regions (see section 5.1); the most common tandem repeats in IDRs are made up of disorder-promoting residues 44,194 and of sequence patterns that are typically associated with disorder. 195 Moreover, a distinction between perfect and imperfect tandem repeats suggests that as the repeat perfection increases, so does the disorder content. 219 Repeats of different composition have been linked to specific functions. 218,221 Consequently, the presence of particular types of repeats is likely to contribute to IDR functioning. Descriptions and examples of different classes of disordered tandem repeats and their structural characteristics have been reviewed previously. 218 For instance, polyproline and polyglutamine stretches are associated with protein and nucleic acid binding and transcription factor activity. 222,223 Protein segments enriched for glutamine and asparagine often occur in disordered regions 224 and are abundant in eukaryotic proteomes, 225 despite their propensity to aggregate or form coiled-coil structures. 226 The aggregation propensity of the Q/N-enriched segments is exploited in the formation of physiologically relevant assemblies such as P-bodies (e.g., Ccr4 and Pop2), stress granules, and processing bodies. 227 However, expanded polyglutamine repeats are also associated with neurodegenerative disorders, the most well-known being Huntington’s disease. 228 Moreover, several prion-like yeast proteins (e.g., Sup35p and Ure2p) contain intrinsically disordered Q/N-rich protein segments that have been implicated in the switch between a soluble and an insoluble, aggregated form. 225,229 Another example of functional disordered repeats occurs in the SR protein family of splicing factors (e.g., ASF/SF2 and SRp75). 230,231 SR proteins mediate the assembly of spliceosome components. They consist of an N-terminal RNA-recognition motif and a disordered C-terminus with tandem repeats of arginine and serine residues (RS domain). Phosphorylation switches the RS domain of the serine/arginine-rich splicing factor 1 (SRSF1) from a fully disordered state to a more rigid structure. 232 Other disordered repeats associated with a specific function include sequences enriched in lysine, alanine, and proline in the histone H1 C-terminal domain, which are involved in the formation of 30 nm chromatin fiber by binding linker DNA between the nucleosomes. 233,234 A final example is dentin sialophosphoprotein (DSPP), which contains extensively phosphorylated repeats of aspartic acid and serine involved in calcium phosphate binding (see section 9.3). 235 Some repeat-containing regions are also prone to undergo phase transitions from a soluble monomeric state to an insoluble large assembly form, as demonstrated for regions rich in proline, threonine, and serine residues in mucins (see section 9.2). 236 6 Protein Interactions Disordered region-mediated molecular interactions have been proposed to work using a combination of conformational selection and induced folding. 92,146,237 These mechanisms of binding are two extreme possibilities and are not mutually exclusive. Both play a role in the interaction between two proteins, the dominant mechanism depending, for example, on the concentrations of the individual proteins 238 and the association rate constants. 84 In conformational selection, addition of binding partners can result in a population shift in the conformational ensemble of a disordered protein (see section 4.2) toward the conformation that is most favorable for binding. 119,145,173,175 This mechanism has been observed in both protein–protein and protein–nucleic acid interactions. 173 Evidence for the role of conformational selection in IDP binding comes, for example, from the interaction between PDEγ and the α-subunit of transducin, 239 which is important in phototransduction. The dynamic ensemble of unbound PDEγ includes a loosely folded state that resembles its structure when bound to transducin. In induced folding, a protein undergoes a disorder-to-order transition upon association with its binding partner. 92,146,240 Evidence for this mechanism in IDP binding comes, for example, from a study investigating the disordered pKID region of CREB and the KIX domain of CREB-binding protein. Upon binding of pKID to the KIX domain, an ensemble of transient encounter complexes forms, which appear to be stabilized primarily by hydrophobic contacts and evolve to form the fully bound state via an intermediate state without disassociation of the two domains. 91,241 6.1 Fuzzy Complexes Although disordered protein regions frequently fold upon interacting with other proteins, complexes with IDPs often retain significant conformational freedom and can only be described as structural ensembles. 242 The conformations that disordered proteins adopt in the bound state cover a continuum, similar to the structural spectrum of free, unbound IDPs, 243 and range from static to dynamic, and from full to segmental disorder. 242 In static disordered complexes, disordered regions can adopt multiple well-defined conformations in the complex, whereas in dynamic disorder they fluctuate between various states of an ensemble in the bound state. Disorder in the bound state can be classified into four molecular modes of action, each of which is associated with specific molecular functions (Figure 11A–D). 176,242 (i) The polymorphic model is a form of static disorder, with alternative bound conformations serving distinct functions by having different effects on the binding partner. Examples are the Tcf4 β-catenin binding domain 244 and the WH2 binding domains of thymosin β4 or ciboulot, 245 which have been shown to adopt several distinct conformations upon β-catenin and actin binding, respectively. Different actin–WH2 domain complexes have alternative interaction interfaces and result in actin polymers with different topologies. 245 The (ii) clamp and (iii) flanking models represent forms of dynamic disorder in which complex formation either involves folding upon binding of two disordered segments that are connected by a linker that remains disordered, or the reverse situation, respectively. The cyclin-dependent kinase (Cdk) inhibitor p21, for example, acts as a clamp. It contains a dynamic helical subdomain that serves as an adaptable linker that connects two binding domains and enables these to specifically bind distinct cyclin and Cdk complex combinations. 246 In both the clamp and the flanking models, disordered regions near the interacting protein segments (often short peptide motifs) contribute to binding by influencing affinity and specificity. 242,247 This phenomenon relates to the importance of the sequence context in modulating disordered binding elements (see section 3). Finally, (iv) the random model is an extreme version of dynamic disorder in protein complexes, which occurs when the IDR remains largely disordered even in the bound state. In this case, interaction is achieved via linear motifs that do not get fixed upon binding. An example is the self-assembly of elastin, where solid-state NMR has provided evidence for dynamic disorder within elastin fibers, which exhibit random-coil like chemical shift values. 248 Another case is the complex between the Cdk inhibitor Sic1 and the SCF ubiquitin ligase subunit Cdc4, which is formed in a phosphorylation-dependent manner. 249 At any given time, only one out of nine Sic1 phosphorylation sites interact with the core Cdc4 binding site, while the others contribute to the binding energy via a secondary binding site or via long-range electrostatic interactions (Figure 12N). Hence, binding interchanges dynamically within the Sic1–Cdc4 complex to provide ultrafine tuning of the affinity. 249,250 Figure 11 Classification of fuzzy complexes by topology (upper panel) and by mechanism (lower panel). Blue arrows indicate interactions between fuzzy disordered regions and structured molecules. Protein Data Bank 147 identifiers for the structures are given in parentheses. Topological categories: (A) Polymorphic. The WH2 domain of ciboulot interacts with actin in alternative locations: via an 18-residue segment (3u9z) or via only three residues (2ff3). The flanking regions remain dynamically disordered. (B) Clamp. The Oct-1 transcription factor has a bipartite DNA recognition motif. The two globular binding domains are connected by a 23 residue long disordered linker (1hf0), shortening of which reduces binding affinity. (C) Flanking. The p27Kip1 cell-cycle kinase inhibitor binds to the cyclin–Cdk2 complex (1jsu). The kinase binding site is flanked by a ∼100 residue long disordered linker, which enables T187 at the C-terminus to be phosphorylated. (D) Random. UmuD2 is a dimer that is produced from UmuD by RecA-facilitated self-cleavage (1i4v). The resulting proteins exhibit a random coil signal in circular dichroism experiments at physiologically relevant concentrations. Mechanistic categories: (E) Conformational selection. The fuzzy N-terminal acidic tail of the Max transcription factor (1nkp) facilitates formation of the DNA binding helix (dark red) of the leucine zipper basic helix–loop–helix (bHLH) motif. (F) Flexibility modulation. The disordered serine/arginine-rich region of the Ets-1 transcription factor (1mdm) changes DNA binding affinity by 100–1000-fold by modulating the flexibility of the binding segment via transient interactions. (G) Competitive binding. The acidic fuzzy C-terminal tail of high-mobility group protein B1 (2gzk) competes with DNA for the positively charged binding surfaces. (H) Tethering. The binding of the virion protein 16 activation domain to the human transcriptional coactivator positive cofactor 4 (2phe) is facilitated by acidic disordered regions, which anchor the binding segments. Bound disordered regions can impact the interaction affinity and specificity of the complex and tune interactions of folded regions 176 with proteins or DNA. 251 Four different mechanisms have been proposed for the formation of fuzzy complexes (Figure 11E–H). (i) The first is conformational selection, when the disordered region shifts the conformational equilibrium of the binding interface toward the bound form. The fuzzy N-terminal tail of the Max transcription factor, for example, reduces electrostatic repulsion in the basic helix–loop–helix (bHLH) domain and thereby facilitates formation of the DNA recognition helices, which increases binding affinity by 10–100-fold. 252 (ii) In the second mechanism, the disordered region(s) modulate flexibility of the binding interface. The serine- and arginine-rich region of the Ets-1 transcription factor exemplifies this mechanism, which reduces DNA binding affinity by 100–1000-fold. 253 (iii) The third mechanism is competitive binding of the disordered region. Here, the IDR acts as a competitive inhibitor of other regions in the same protein for binding to a partner. The acidic fuzzy C-terminal tail of high-mobility group protein B1 (HMGB1) negatively regulates interaction of the HMG DNA binding domains by occluding the basic DNA-binding surfaces. 254 (iv) In the fourth mechanism, the disordered region serves to tether a weak-affinity binding region to increase its local concentration. For example, a fuzzy N-terminal domain anchors the human positive cofactor 4 (PC4) to several transactivation domains including the herpes simplex virion protein 16 (VP16). 255 All mechanisms of disordered complex formation affect binding to different degrees and can be further tuned by post-translational modifications. 176,251 PTMs in the disordered region may act as affinity tuners by modulating the charge available for biomolecular interactions. 256 6.2 Binding Plasticity Structural analysis of a large number of intrinsic disorder-based protein complexes resulted in another categorization of IDRs based on their binding plasticity (Figure 12). 257 Examples of relatively static IDR-based complexes are (i) mono- and polyvalent complexes, which typically consist of interactions between disordered segments and one or multiple spatially distant binding sites on their binding partners, respectively, (ii) chameleons, such as p53, that have different structures when binding to different proteins, (iii) penetrators that bury significant parts of the protein inside their binding partners, and (iv) huggers, which function in protein oligomerization, for example, by coupled folding and binding of disordered monomers. In addition to these relatively static complexes involving IDRs, one can identify coiled-coil-based complexes. Regions that make up coiled coils are typically highly disordered in monomeric state and gain helical structure upon coiled-coil formation, giving rise to several distinguishable types of complexes, such as intertwined strings, connectors, armatures, and tentacles. Figure 12 A portrait gallery of disorder-based complexes. Illustrative examples of various interaction modes of intrinsically disordered proteins are shown. Protein Data Bank 147 identifiers for the structures are given in parentheses. (A) MoRFs. Aa, α-MoRF, a complex between the botulinum neurotoxin (red helix) and its receptor (a blue cloud) (2NM1); Ab, ι-MoRF, a complex between an 18-mer cognate peptide derived from the α1 subunit of the nicotinic acetylcholine receptor from Torpedo californica (red helix) and α-cobratoxin (a blue cloud) (1LXH). (B) Wrappers. Ba, rat PP1 (blue cloud) complexed with mouse inhibitor-2 (red helices) (2O8A); Bb, a complex between the paired domain from the Drosophila paired (prd) protein and DNA (1PDN). (C) Penetrator. Ribosomal protein s12 embedded into the rRNA (1N34). (D) Huggers. Da, E. coli trp repressor dimer (1ZT9); Db, tetramerization domain of p53 (1PES); Dc, tetramerization domain of p73 (2WQI). (E) Intertwined strings. Ea, dimeric coiled coil, a basic coiled-coil protein from Eubacterium eligens ATCC 27750 (3HNW); Eb, trimeric coiled coil, salmonella trimeric autotransporter adhesin, SadA (2WPQ); Ec, tetrameric coiled coil, the virion-associated protein P3 from Caulimovirus (2O1J). (F) Long cylindrical containers. Fa, pentameric coiled coil, side and top views of the assembly domain of cartilage oligomeric matrix protein (1FBM); Fb, side and top views of the seven-helix coiled coil, engineered version of the GCN4 leucine zipper (2HY6). (G) Connectors. Ga, human heat shock factor binding protein 1 (3CI9); Gb, the bacterial cell division protein ZapA from Pseudomonas aeruginosa (1W2E). (H) Armature. Ha, side and top views of the envelope glycoprotein GP2 from Ebola virus (2EBO); Hb, side and top views of a complex between the N- and C-terminal peptides derived from the membrane fusion protein of the Visna (1JEK). (I) Tweezers or forceps. A complex between c-Jun, c-Fos, and DNA. Proteins are shown as red helices, whereas DNA is shown as a blue cloud (1FOS). (J) Grabbers. Structure of the complex between βPIX coiled coil (red helices) and Shank PDZ (blue cloud) (3L4F). (K) Tentacles. Structure of the hexameric molecular chaperone prefoldin from the archaeum Methanobacterium thermoautotrophicum (1FXK). (L) Pullers. Structure of the ClpB chaperone from Thermus thermophilus (1QVR). (M) Chameleons. The C-terminal fragment of p53 gains different types of secondary structure in complexes with four different binding partners, cyclin A (1H26), sirtuin (1MA3), CBP bromo domain (1JSP), and s100ββ (1DT7). Panels A–M reprinted with permission from ref (257). Copyright 2011 The Royal Society of Chemistry. (N) Dynamic complexes. Schematic representation of the polyelectrostatic model of the Sic1–Cdc4 interaction. An IDP (ribbon) interacts with a folded receptor (gray shape) through several distinct binding motifs and an ensemble of conformations (indicated by four representations of the interaction). The intrinsically disordered protein possesses positive and negative charges (depicted as blue and red circles, respectively) giving rise to a net charge ql , while the binding site in the receptor (light blue) has a charge qr . The effective distance ⟨r⟩ is between the binding site and the center of mass of the intrinsically disordered protein. Panel N was reprinted with permission from ref (243). Copyright 2010 John Wiley & Sons, Inc. 7 Evolution Disordered regions typically evolve faster than structured domains. 51−56,107 This behavior largely stems from a lack of constraints on maintaining packing interactions, which drives purifying selection in structured sequences. 258 However, disordered residues do display a wide range of evolutionary rates (Box 2). The following section discusses the evolutionary classifications of disordered protein regions. IDRs with similar functions and properties tend to have similar evolutionary characteristics. 7.1 Sequence Conservation While the amino acid sequence of disordered regions evolves at different rates, the property of disorder is usually conserved for functional sequences. 54,159 Sequence conservation of IDRs varies according to their specific functions and provides another means for their classification. 54,259,260 Three biologically distinct classes of IDRs with specific function were identified using a combination of disorder prediction and multiple sequence alignment of orthologous groups across 23 species in the yeast clade (Figure 13): (i) flexible disorder describes regions where disorder is conserved but that have quickly evolving amino acid sequences (i.e., there is a requirement to be disordered, regardless of the exact sequence), (ii) constrained disorder describes regions of conserved disorder with also highly conserved amino acid sequences, and (iii) nonconserved disorder, where not even the property of being disordered is conserved in closely related species. For flexible disorder, low sequence conservation is expected if the property of disorder itself, as opposed to disorder in combination with specific sequence, is the only requirement for function. Examples of functions that mainly require the biophysical flexibility of disordered regions are entropic springs, spacers, and flexible linkers between well-folded protein domains. 37,39,57,58 The linker in RPA70 is an example where the dynamic behavior is conserved even when the sequence conservation is low. 60 Flexible disorder is the most common of the three evolutionary classes with just over one-half of disordered residues in yeast. It appears to account not just for the “flexibility” functions mentioned above, but also for many of the characteristics traditionally associated with disordered regions, such as strong association with signaling and regulation processes, 35,50,104,190,261,262 rapid sequence evolution, 51−56,107 the presence of short linear motifs (which are themselves conserved, see below), 47,72 and tight regulation (see section 8). 68,263 By contrast, constrained disorder (about a third of disordered residues in yeast) is associated with different properties and functions, such as chaperone activity and RNA-binding ribosomal proteins. 54 Many proteins that contain the evolutionarily constrained type of disorder can adopt a fixed conformation, suggesting that these regions might undergo folding upon binding to their targets. This structural transition might impose a high degree of local structural constraints, which results in constraints on the protein sequence alongside requirements to be flexible. 54 Constrained disordered residues also occur more often in annotated protein sequence families (domains) than flexible disorder, but both types are strongly depleted in domains compared to structured regions. In human, both flexible and constrained disorder are enriched in proteins functioning in differentiation and development, 264 which reflects the importance of IDPs in these processes. Finally, nonconserved disorder accounts for around 17% of disordered residues in yeast and appears to be largely nonfunctional. Figure 13 Classification of disordered regions according to their evolutionary conservation (constrained, flexible, and nonconserved disorder). (A) Schematic of computing disorder conservation and amino acid sequence conservation. The alignments are used to calculate the percentage of sequences in which a residue is disordered and the percentage of sequences in which the amino acid itself is conserved. A residue is considered to be conserved disordered if the property of disorder is conserved in at least one-half of the species. Similarly, the amino acid type of a residue is considered conserved if it is present in at least one-half of the species. Disordered residues in which both sequence and disorder are conserved are referred to as constrained disorder. Disordered residues in which disorder is conserved but not the amino acid sequence are referred to as flexible disorder. Residues that are disordered in S. cerevisiae but not cases of conserved disorder are referred to as nonconserved disorder. (B) Disorder splits into three distinct phenomena. Functional enrichment maps of proteins enriched in flexible disorder versus constrained disorder. The area of each rectangle is proportional to the occurrence of that type of disorder in the alignments. Related gene ontology terms are grouped based on gene overlap. Reprinted with permission from ref (54). Copyright 2011 Springer Science + Business Media. Short linear motifs (see section 3.1) 48,125 constitute a special case. Even though SLiMs almost exclusively lie within disordered regions, their own amino acid sequence tends to be conserved. 48 These properties, together with the difficulty of aligning rapidly evolving disordered sequences, result in the motifs to move around when comparing their position in different sequences. In fact, not only do motifs move around (due to insertions and deletions of amino acids around the motif in the sequence 67,265 ), they can also permute their positions with respect to other structural and functional modules. For example, SUMO modification sites in p53 are seen after and before the oligomerization domain in human and fly, respectively. 266 Such behavior could emerge by convergent evolution and loss of the motif in the original site, as only a few amino acids need to mutate to make a new motif elsewhere in the sequence. As long as the position of the motif with respect to the other modules does not affect function, such permutations will not affect fitness and hence may emerge relatively easily during evolution. These are indeed confounding issues when aligning disordered regions among orthologous proteins to identify functional motifs. In many ways, the disordered regions that contain SLiMs constitute flexible disorder as by the above classification, as their main role is to provide flexibility to enable access to the linear motif for proteins that will bind them as ligands 267 or introduce post-translational modifications. 47,48 Phosphorylation sites are closely related to short linear motifs that function in binding, but are often too short and weakly conserved to recognize via computational means. 268 More than 90% of sites phosphorylated by the yeast Cdk1 are in predicted disordered regions, 67 as consistent with previous studies highlighting the importance of IDRs as display sites for phosphorylation and other PTMs (see sections 2.2 and 3.1). 45,46 Comparison of the phosphorylation sites in orthologues of the Cdk1 substrates revealed that the precise position of most phosphorylation sites is not conserved. Instead, clusters of sites move around in the alignment of rapidly evolving disordered regions. 69,250,269 Another example of the role of flexible disorder in signaling and regulation is the yeast serine-arginine protein kinase Sky1, which regulates proteins involved in mRNA metabolism and cation homeostasis. The Sky1 C-terminal loop is intrinsically disordered and contains phosphosites that are important for regulating its kinase activity. 270 Conservation analysis has shown that the loop is conserved for disorder but not for sequence. 54 The combination of sequence conservation of IDRs and conservation of their amino acid composition between human and seven other eukaryotes (chimp, dog, rat, mouse, fly, worm, and yeast) also identifies functional preferences. 260 IDRs with high residue conservation (HR) are enriched in proteins involved in transcription regulation and DNA binding. Low residue conservation in combination with high conservation of the amino acid type composition (LRHT) of the IDR (i.e., high similarity of overall amino acid composition between the human IDR and its orthologs) is often associated with ATPase and nuclease activities. Finally, IDRs that show neither conservation of sequence nor conservation of amino acid composition (LRLT) are abundant in (metal) ion binding proteins. 7.2 Lineage and Species Specificity Increasingly complex organisms have higher abundances of disorder in their proteomes. 35,271 An average of 2% of archaeal, 4% of bacterial, and 33% of eukaryotic proteins have been predicted to contain regions of disorder over 30 residues in length, 35 although there is much variation within kingdoms. 272,273 In human, 31% of proteins are more than 35% unstructured, 68 and 44% contain stretches of disorder longer than 30 residues 49,161,208 (similar data shown in Figure 2A). Human IDPs are spread relatively uniformly across the chromosomes, with percentages ranging from 38% (for genes encoding IDPs on chromosome 21) to 50% on chromosomes 12 and X. 161 A computational analysis of disorder in prokaryotes has corroborated the higher abundance of disorder in Bacteria as compared to Archaea. 274 Moreover, in agreement with the low abundance of disorder in prokaryotes, none of the 13 mitochondrial-encoded proteins are disordered. 161 Systematic analysis of IDP occurrence in 53 archaeal species showed that disorder content is highly species-dependent. 275 For example, Thermoproteales and Halobacteria proteomes have 14% and 34% disordered residues, respectively. Harsh environmental conditions seem to favor higher disorder contents, suggesting that some of the archaeal IDPs evolved to help accommodate hostile habitats. 276 Structural disorder is more common in viruses than in prokaryotes. 277 The characteristics of IDRs seem well suited for especially small RNA viruses with extremely compact genomes. 278,279 For example, disordered regions could buffer the deleterious effects of mutations introduced by low-fidelity virus polymerases better than would structured domains. 277 The flexibility of IDRs to interact with many different proteins, such as proteins of the host immune system, is another useful feature for compact viruses because it maximizes the amount of functionality they encode while minimizing the required genetic information. 280 At the same time, several human innate immunity proteins have predicted disordered regions that could be important for their pathogen defense function. 281 For example, the RIG-I-like receptors (RLRs) RIG-I and MDA5 recognize different types of viral double-stranded RNA (dsRNA). 282 This functional divergence is partly achieved by differential flexibility of a loop that is rigid in RIG-I, but disordered in MDA5, resulting in different RNA binding preferences. 283 Furthermore, the disordered linker between the RNA-binding domains and the two N-terminal CARD (caspase activation and recruitment) domains of MDA5 helps facilitate oligomerization of the CARD domains, which initiates downstream signaling. 283 Activated RIG-I and MDA5 promote the formation of prion-like aggregates of the CARD domains of MAVS (mitochondrial antiviral-signaling). 284 MAVS has a highly disordered central region that contains multiple phosphorylation sites and interacts with several proteins, such as TRAF2 and TRAF6 through their respective consensus binding motifs (PxQx[TS] and PxExx[FYWHDE], respectively). 285 These interactions are part of a signaling pathway that activates the transcription factors IRF3/7 and NF-κB, leading to the expression of proinflammatory cytokines such as IFN-α/β and various proteins with direct antiviral activity. 282 For example, to counteract viral infection, protein kinase R (PKR) phosphorylates the translation initiation factor eIF2α in the presence dsRNA, which reduces global protein synthesis in the cell. 286 PKR contains a long disordered interdomain region that may become ordered upon RNA binding and could affect PKR dimerization. 287,288 Interestingly, viruses counteract PKR action by mimicking eIF2α and competing for PKR binding, as has been shown in the case of the poxvirus protein K3L. 289 PKR is under intense positive selection to keep recognizing eIF2α while minimizing interaction with viral antagonists. 289 Many of the changing sites in PKR are in a dynamic loop near the interaction interface with both eIF2α and K3L. 290 Similarly, recognition of retrovirus capsids by the restriction factor TRIM5α is mediated by disordered regions in the SPRY domain, which bear many positively selected residues that are essential for the antiviral activity. 291 The SPRY domain exists as an ensemble of disordered conformations that determine the specificity and affinity of the interaction between TRIM5α and the viral capsid. 292−294 In this way, the evolutionary flexibility of disordered regions (see section 7.1) provides opportunities for proteins of the host immune system to compete with rapidly changing pathogens while maintaining their functionality. In addition to the variation in prevalence of disordered regions between species, different kingdoms of life seem to use conserved IDRs for different functions: eukaryotic and viral proteins use disorder mainly for mediating transient protein–protein interactions in signaling and regulation, while prokaryotes use disorder mainly for longer lasting interactions involved in complex formation. 159 Thus, knowledge on the lineage, species, and origin of a disordered region could help in predicting its likely function. 7.3 Evolutionary History and Mechanism of Repeat Expansion Tandem repeats are enriched for intrinsic disorder (see section 5.7), and IDRs are increasingly abundant in increasingly complex organisms (see section 7.2). The genetic instability of repetitive genomic regions in combination with the structurally permissive nature of IDRs might have driven the increase in the amount of disorder during evolution. Disordered repeat regions have been shown to fall into three categories, based on their evolutionary history and acquired functional properties (Figure 14): 61 type I regions have not undergone functional diversification after repeat expansion (e.g., the titin PEVK domain), type II repeats have acquired diverse functions due to mutation or differential location within the sequence (e.g., the C-terminal domain of eukaryotic RNA polymerase II), and type III regions have gained new functions as a consequence of their expansion per se (e.g., the prion protein octarepeat region). Figure 14 Repeat expansion creates IDRs. IDRs are abundant in repeating sequence elements, which suggests that repeat expansion is an important mechanism by which genetic material encoding for structural disorder is generated. The expanding repeats may fall into three classes (types) in terms of their functional diversification following expansion. Individual repeats may remain functionally equivalent (type I), or diversify (type II), or collectively acquire a completely new function (type III). Dark-tone red indicates structural disorder of the repeat, which may undergo full (dark-tone blue) or partial (green) induced folding upon binding to a partner. Adapted with permission from ref (61). Copyright 2003 John Wiley & Sons, Inc. 8 Regulation Altered availability of IDPs is associated with diseases such as cancer and neurodegeneration. 190,263,295−299 Indeed, genes that are harmful when overexpressed (i.e., dosage-sensitive genes) often encode proteins with disordered segments. 300 Multiple mechanisms at different stages during gene expression (from transcript synthesis to protein degradation) control the availability of IDPs. 68 Their tight regulation ensures that IDPs are available in appropriate levels and for the right amount of time, thereby minimizing the likelihood of ectopic interactions. Disease-causing altered availability of IDPs may result in imbalances in signaling pathways by sequestering proteins through nonfunctional interactions involving disordered segments (i.e., molecular titration 263 ). The following section discusses possible functional roles of proteins with IDRs based on their cellular regulatory properties such as transcript abundance, alternative splicing, degradation kinetics, and post-translational processing. 8.1 Expression Patterns Five different expression patterns were identified for transcripts encoding highly disordered proteins by investigating the mRNA levels from over 70 different human tissues and comparing the number of tissues in which IDP transcripts are expressed against the level of expression (Figure 15). 208 The expression classes are associated with specific functions. (i) The first subgroup (Figure 15, light blue markers) shows constitutive high expression in all tissues and consists exclusively of large ribosomal subunit proteins, which are almost entirely disordered. (ii) The second group (blue-green) represents transcripts that show high expression levels in the majority of tissues. These often function as protease inhibitors, splicing factors, and complex assemblers. (iii) Moderately expressed transcripts (green) typically encode disordered proteins involved in DNA binding and transcription regulation. (iv) IDPs that are expressed in a tissue-specific manner (yellow) are enriched for cell organization regulators, transcription cofactors, and factors that promote complex disassembly. Finally, (v) the remaining transcripts form a group (gray) not detected to be abundant in any of the tissues studied. This low and transient expression group contains more than one-half of the IDP transcripts analyzed and has a variety of functions. Figure 15 A summary of expression–function trends for human transcripts encoding highly disordered proteins. The x-axis represents the log10 number of tissues in which the transcript is expressed; the y-axis represents the log10 average magnitude of expression within the tissues. From the data, five distinct functional classes of highly disordered human proteins become apparent. Adapted with permission from ref (208). Copyright 2009 Springer Science + Business Media. 8.2 Alternative Splicing Trends in transcriptional regulation (alternative promotor and polyadenylation site usage) and post-transcriptional regulation (alternative splicing by inclusion or exclusion of exons) can also be informative of the role that specific disordered protein regions play in the cell (Figure 16). Alternatively spliced exons are overall more likely to encode intrinsically disordered rather than structured protein segments. 161,301−303 This tendency is even more pronounced in alternative exons whose inclusion or exclusion is regulated in a tissue-specific manner. 304 IDRs that are encoded by these tissue-specific alternative exons frequently influence the choice of protein interaction partners and can be instrumental in protein regulation 304,305 by embedding binding motifs, and residues that can be post-translationally modified. 304 However, simple alteration of the length of a disordered region 306 can also modulate the overall protein function (Figure 16). Changes in IDR length can be an effective mechanism for modifying the affinity of interactions that a protein makes, particularly in instances where a disordered region is responsible for the positioning of protein binding motifs or domains. 307,308 Among the alternative exons, those that exhibit conserved splicing patterns across different species are particularly likely to have important regulatory roles. For example, tissue-specific exons, which are alternatively spliced in multiple different mammals, remarkably often contain IDRs with embedded phosphosites. 309 Disordered regions encoded by these exons are hence likely to act as modulators of protein function depending on the tissue where they are expressed. 309 While tissue-specific exons that are alternatively spliced in a conserved fashion often code for phosphosites, the emergence of novel exons in a gene, although at first likely detrimental, 310 is a possible template for the evolution of short interaction motifs. 311 Furthermore, changes in exon regulation can also be important for the emergence of novel adaptive functions. Accordingly, protein segments encoded by exons, which are alternatively spliced either in a single species or in a whole evolutionary lineage, are enriched in short binding motifs, and alternative inclusion of disordered regions encoded by these exons is conceivably a source of evolutionary novelty. 312 Figure 16 Transcriptional and post-transcriptional gene regulation can be informative of IDR function. How inclusion of exons that code for IDRs is regulated during gene transcription and alternative splicing can give insights into the functional roles of the encoded disordered regions. For example, tissue- or developmental-specific regulation of alternative splicing or alternative promoter and polyadenylation site usage can be associated with important roles of the encoded IDRs in protein regulation and cellular interactions through, for example, the presence of binding motifs and phosphosites. Additionally, information on the conservation of patterns of exon inclusion (i.e., events shared among different evolutionary lineages versus species-specific events) can aid in better characterization of the encoded IDRs. The figure illustrates a hypothetical example where an exon (largest red box) that is included in a tissue-specific manner both in human and in mouse encodes an IDR that embeds a phosphosite (P) and is involved in protein regulation. The human gene depicted in the figure has an additional exon (smallest red box), which encodes an IDR with a short interaction motif and which is also included in a tissue-specific manner in humans. Gene structures, mature mRNAs, and corresponding protein isoforms are shown for human and mouse brain and heart tissues. On the right, possible functional roles of the IDRs encoded by the brain isoforms are illustrated. The examples illustrate how protein functional space can increase due to alternative splicing of exons that encode IDRs. Adapted with permission from ref (304). Copyright 2012 Elsevier. In addition to the tendency of cassette alternative exons to frequently encode IDRs, exons adjacent to the alternatively spliced ones are also likely to code for disordered regions around the insertion point for the alternatively spliced segment. 264,302 These disordered regions not only provide the structural flexibility that tolerates both presence and absence of the alternatively spliced segment, but they can also contain interaction motifs themselves. 264 Furthermore, on the transcriptional level, diversity in protein isoforms can be created through both alternative splicing and usage of alternative promoters and polyadenylation sites. Protein segments that are encoded by the two latter mechanisms can contain disordered regions with motifs that define protein localization and stability. 313 Taken together, these examples illustrate how better understanding of gene regulation and knowledge of evolutionarily conserved and novel isoforms can provide insights into possible functional roles of whole proteins and specific protein regions. 8.3 Degradation Kinetics Another emerging functionality of disordered regions is their role in protein degradation. 314−321 Protein half-life generally correlates with the fraction of disordered residues, 68,317 and proteins that get ubiquitinated specifically upon heat shock stress are typically disordered. 322 Although ubiquitination by E3 ligases has a dominant role in recruiting proteins to the proteasome for degradation, 323,324 some IDRs of sufficient length allow for efficient initiation of degradation by the proteasome independent of the ubiquitination status. This idea is supported by in vitro experiments showing that degradation of tightly folded proteins is accelerated when a disordered region is attached to model substrates. 315,321 Efficient degradation only occurs when the disordered terminal region is of a certain minimal length, 321 and degradation may be initiated by IDRs either at the protein terminus or internally. 314−321 Proteins that contain IDRs of sufficient length may therefore have increased turnover, although the exact length requirements will depend on the substrate. At the same time, not all IDRs influence protein half-life. For example, disordered polypeptides with specific amino acid compositions such as glycine-alanine and polyglutamine repeats can attenuate rather than accelerate degradation by the proteasome. 325−327 The formation of protein complexes or transient interactions with other proteins may also protect IDPs from degradation. Thus, we can distinguish a novel functional class of IDRs: those that influence protein degradation (degradation accelerators) versus those that do not. These properties might be associated with specific protein function. For example, proteins that contain IDRs of a given length are probably more susceptible to degradation, possibly linking them to functions of IDPs with low expression. Some highly disordered proteins (e.g., p53, p73, IκBα, BimEL) can, at least in vitro, be degraded by the 20S proteasome independent of ubiquitination. 328−333 Specialized proteins termed “nannies” have been shown to bind to and protect IDPs from ubiquitin-independent 20S proteasomal degradation. 334 A free IDP, such as newly synthesized p53, might be degraded by the 20S proteasome, which leads to fast degradation kinetics. After a nanny binds the IDP (Hdmx in the case of p53), slower, ubiquitin-dependent degradation by the 26S proteasome takes place. This biphasic decay has been proposed as a way to distinguish structured proteins from IDPs and the proteins that protect them from degradation. 334 8.4 Post-translational Processing and Secretion The majority of secretory proteins are targeted to the endoplasmic reticulum (ER) via an N-terminal signal peptide, which helps to initiate translocation of nascent chains into the ER. 335,336 Bioinformatic analysis of proteins containing N-terminal ER signal peptides has identified only 10% of these proteins as IDPs (>70% disordered), suggesting that IDPs are under-represented in the secretome. 337 The fact that secreted proteins are rarely IDPs might be partially explained by the requirement for largely disordered proteins to contain an α-helical prodomain for correct import into the ER lumen, 338 as demonstrated for intrinsically disordered prohormones. 337 IDPs lacking this structured, α-helical domain were subjected to ER-associated degradation (ERAD) despite the presence of a signal peptide. 338 Despite the relative depletion of IDPs in the secretome, a number of important IDPs are processed within the ER, including many prohormones, 337,339 components of the extracellular matrix, 340 and proteins involved in biomineralization (see section 9.3). 117,341,342 Pre-pro-opiomelanocortin (pre-POMC) is a disordered 285 amino acid protein whose signal peptide is removed during translation to create the 241-residue pro-opiomelanocortin (POMC). This prohormone has at least eight putative basic-rich cleavage sites and is able to yield as many as 10 biologically active peptides including adrenocorticotropic hormone (ACTH) and β-endorphin. The processing of POMC is tissue-specific and depends on the type of convertase enzyme expressed. 343 Other prominent examples of disordered extracellular proteins are elastin and other components of elastic fibers, 344 small integrin-binding ligand N-linked glycoproteins (SIBLINGs) (see section 9.3), 340−342,345 and mucins (see section 9.2). 236 Thus, although secreted proteins are not particularly enriched for structural disorder overall, some IDPs are essential for biomineralization, tissue organization, and hormonal signaling. In line with the features of intracellular IDPs, extracellular structural disorder is heavily post-translationally modified and involved in extensive interactions that organize large molecular assembles while binding multiple interaction partners. 117,341,342 9 Biophysical Properties A large range of biophysical work has been carried out on structural disorder in proteins using a variety of experimental techniques (Box 2). 346 Previous sections have touched on several aspects. Disordered regions rapidly shift within a continuum of variably extended or globular conformations and are best described as dynamic ensembles (see section 4). The amino acid sequence of a disordered region determines which conformations it can sample, depending for example on the charge properties (see section 5.1). Disordered proteins frequently fold upon binding, and their binding thermodynamics allow for fast, transient, but highly specific interactions (see sections 2, 3, and 6). The following section discusses three other physical properties that are essential for the biology of some IDRs and IDPs: solubility, the ability to undergo phase transitions, and the role in biomineralization. 9.1 Solubility The solubility of a protein depends upon the favorability of its interactions with water. Globular proteins bury hydrophobic amino acids within their solvent-excluded cores, while their surfaces are generally enriched in polar and charged amino acids that interact favorably with water, leading to aqueous solubility. 347,348 The presence of hydrophobic surface residues, for example, binding sites for other proteins, and the denaturation of otherwise folded proteins lead to the exposure of hydrophobic residues to water and reduce solubility, sometimes leading to aggregation and precipitation. Disordered proteins do not spontaneously fold into globular structures because their sequences are deple