ChSeq: A database of chameleon sequences

Grishin, Nick V.; Karplus, P. Andrew; Kinch, Lisa N.; Li, Wenlin

doi:10.1002/pro.2689

ScienceOpen: research and publishing network

For Publishers

For Researchers

Blog
About

Search
Advanced search

views

recommends

Record: found
Abstract: not found
Article: not found

ChSeq: A database of chameleon sequences : ChSeq: A Database of Chameleon Sequences

Author(s): Wenlin Li , Lisa N. Kinch , P. Andrew Karplus , Nick V. Grishin

Publication date Created: July 2015

Publication date (Print): July 2015

Journal: Protein Science

Publisher: Wiley-Blackwell

Read this article at

ScienceOpenPublisher PubMed

Bookmark

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Related collections

Most cited references 54

Record: found
Abstract: found
Article: found

Is Open Access

ECOD: An Evolutionary Classification of Protein Domains

Hua Cheng, R. Schaeffer, Yuxing Liao … (2014)

Introduction The billions of proteins in extant species constitute a bewilderingly diverse protein world. To understand this world, systematic classifications are needed to reduce its complexity and to bring order to its relationships. As proteins are the products of evolution, their phylogeny provides a natural foundation for a meaningful hierarchical classification. As in the classification of species, a phylogenetic classification of proteins identifies evolutionary relationships between proteins and groups homologs (proteins that are descendants of a common ancestor) together. Because homologs generally share similar three-dimensional (3D) structures and functional properties, such a classification provides a valuable platform for studying the laws of protein evolution by comparative analysis as well as for predicting structure and function by homology-based inference. Many protein classifications are currently available. Comprehensive sequence-based classifications such as Pfam [1] and CDD [2] are among the most popular protein annotation tools. When sequence-only methods fail to reveal more distant evolutionary links, 3D structures allow us to see further back in time, as protein structure is generally better preserved than sequence in evolution [3]. Currently, the two leading structure classifications are SCOP (Structural Classification of Proteins) [4] and CATH (Class, Architecture, Topology, Homology) [5], both of which are widely used in analyzing protein sequence, structure, function, and evolution and in developing various bioinformatics tools. CATH (http://www.cathdb.info) is largely automatic with added manual curation and emphasizes more on geometry, while SCOP is mainly manual and focuses on function and evolution. In the SCOP [4] (http://scop.mrc-lmb.cam.ac.uk/scop/index.html) hierarchical classification, closely related domains are grouped into families; families with structural and/or functional similarities supporting common ancestry are grouped into superfamilies; superfamilies with similar 3D architectures and topologies are grouped into folds; and folds with similar secondary structure compositions are grouped into classes. Cataloging remote homologies identified by a combination of visual inspection, sequence and structure similarity search, and expert knowledge, the SCOP superfamily is the broadest level indicating homology and offers invaluable insights in protein evolution. However, SCOP tends to be conservative in assessing evolutionary relationships, and many homologous links reported in literature are not currently reflected [6], [7], [8], [9], [10], [11]. Also, the recent dramatic increase of available structures in the PDB [12] (http://www.pdb.org) hinders careful manual curation in SCOP. Recently, a new version of SCOP (SCOP2) [13] was introduced that eschews hierarchical classification in place of a network of relationships (homologous and structural), although this database has not been made current with PDB. To partially alleviate this problem, ASTRAL now offers SCOPe, a sequence-based extension of the original SCOP hierarchy [14]. Nevertheless, not a single protein classification database has kept current with the PDB database. We maintain that the most recently determined structures, especially those evolutionarily distant from classified proteins, attract the most interest and hence are the most important to classify quickly and accurately. However, automatic updates, such as those in ASTRAL, are only able to deal with easily classifiable proteins. Here we introduce the ECOD (Evolutionary Classification Of protein Domains) database. Our goal is threefold: (1) to construct a comprehensive domain classification based on evolutionary connections, (2) to extend the realm of connections to include remote homology, and (3) to maintain concurrent updates with the PDB. Because experimental data is very sparse compared to sequence data, establishing an evolutionary-based classification scheme of structures allows for biological insight into related proteins that otherwise lack functional information. In such a scheme, close homologs admittedly represent the most relevant source of functional inference. However for most proteins, only distant homologs have been studied in detail. Fortunately, many examples have shown that analysis of proteins in the context of their distant homologs provides functional clues that advance biological research [15], [16], [17], [18]. In addition, remote homology offers deeper insights in protein evolution. In order to extend distant evolutionary relationships beyond the SCOP superfamily level in ECOD, we apply state of the art homology-inference algorithms both developed in our group [19], [20] as well as by others [21], [22], manually analyze and verify the suggested homologous links, and incorporate findings from literature. For weekly updates, we rely on a computational pipeline that automatically and confidently classifies the majority of newly released structures and flags incompletely classified and unclassifiable structures, as well as a web interface that presents those difficult to deal with structures and pre-computed data in a convenient way for rapid manual inspection and classification. ECOD is a publicly available database (http://prodata.swmed.edu/ecod/). By focusing on remote homology and weekly updates, ECOD strives to provide a more simplified and up-to-date view of the protein world than is currently available in existing classifications. As such, ECOD is unique in combining the following features: 1) the aforementioned weekly updates, following new releases from the PDB; 2) a hierarchy that specifically incorporates sequence-based relationships in a family level of close homology; 3) a classification that reflects more distant evolutionary connections; 4) a hierarchy that lacks a SCOP-like fold level, as the definition of “fold” is often subjective [23]; 5) domain partitions for all former members of the SCOPmulti-domain protein class; and 6) combination of membrane proteins with their soluble homologs where an evolutionary relationship can be hypothesized. Theoretically, ECOD catalogs rich and up-to-date information about protein structure for the studies on protein origins and evolution; and practically, it helps homology-based structure and function prediction and protein annotation by providing a pre-compiled search database. Methods We first developed a pilot version of ECOD based on SCOP 1.75 [4]. To detect remote homologies beyond the SCOP superfamily level, 40% identity domain representatives in the first 7 classes in SCOP 1.75 were retrieved from ASTRAL [24] and compared in an all-versus-all fashion. Four scores were computed for each pair: HHsearch probability [21], DALI Z-score [22], HorA combined score [20], and HorA SVM score [19]. Domain pairs with high scores were manually inspected and analyzed. The decision on whether any given pair is homologous was based on considerations of the aforementioned scores, literature, functional similarity (such as common cofactor-binding residues), shared unusual structural features [25], domain organization, oligomerization states, and disulfide bond positions. Since the SCOP superfamily level is reliable and conservative, we typically only merged SCOP superfamilies into homologous (H-) groups. In addition to merging SCOP superfamilies, we split SCOP entries with multiple domains or with duplications, and corrected rare inconsistencies in the SCOP classification. Cytoscape [26] clustering was used to aid manual analysis by displaying domains and high-scoring links. After 40% representatives were classified, other SCOP 1.75 domains were automatically mapped into the ECOD hierarchy using MUSCLE alignments [27]. Many hierarchical groups in the ECOD pilot version retained the names of their original SCOP counterparts. Those structures not classified in SCOP 1.75 were partitioned and assigned to ECOD using a combination of sequence and structural homology detection methods. We used an iterative pipeline of three sequence homology detection methods of increasing sensitivity and decreasing specificity to partition input proteins into domains (Fig. 1). First, the input protein sequence is queried against a library of known ECOD full-length chains (containing both single-domain and multi-domain architectures) using BLAST [28],[29]. Where significant sequence similarity (E-value 80%). Finally, for detection of more distant homology, a query sequence profile was generated using HHblits [21]. This profile was used to query a database of ECOD representative domain profiles using HHsearch. Domains from the input chains could be classified by any combination of the three sequence-based methods (chain BLAST, domain BLAST, or domain HHsearch). Following partition, a boundary optimization procedure based on the structural domain parser, PDP, was run to eliminate small interstitial gaps between assigned domains and at termini [30]. 10.1371/journal.pcbi.1003926.g001 Figure 1 Workflow of the ECOD automatic domain classification pipeline. Unclassified structures enter from the top (white). Firstly, peptides, coiled-coils, and other unclassifiable regions are removed where possible and placed in their respective special architectures (orange). Secondly, unassigned regions of the input sequence are iteratively assigned by descending best hits from BLAST and HHsearch-based searches of ECOD databases. Assemblies of putative domains are optimized and assigned (green). If the chain is incomplete by sequence, a similar process occurs using DaliLite searches. If the query remains unclassified, it is manually curated (yellow). Input protein chains with a set of detected domains with full residue coverage from the sequence pipeline were considered to be complete. Domains from these chains were then assigned to the ECOD hierarchy broadly using the classification of their hit domain. Following this assignment a combination of HMMER/Pfam and HHsearch-based clustering was used to finely tune family assignments [1], [31]. Domains were clustered into F-groups by Pfam where confident HMMER3-based assignments could be found. Where domains had no confident Pfam assignment, all-versus-all HHsearch-based complete linkage clustering was used to generate an F-group [32] where all domains shared 90% HHsearch probability. We specifically designate provisional representatives in F-groups where no member shares close homology with a representative ECOD domain for manual examination. Input protein chains that could not be fully assigned by the sequence pipeline were passed to the structural pipeline. If a protein chain could not be assigned by the sequence pipeline, it was queried against a library of representative ECOD domain structures using DaliLite [33]. Domains were assigned where significant structural similarity existed to a known ECOD domain and where the aligned region passed a simple BLOSUM-based alignment score [34]. As in the sequence pipeline, the boundaries of structurally assigned domains were optimized, and those chains that could be completely assigned (100% residue coverage) were added to the classification. Where a chain could not be completely assigned, it was passed to the manual curators for boundary refinement or assignment. As we neared completion of the PDB, the need for structural search decreased as the number of remaining structures was small enough to manually curate. Difficult structures that could not be completely and confidently classified by the pipeline required manual curation. We first inspected the mapping suggested by the pipeline. Oftentimes, the suggested mapping was correct for most or part of the query structure, and we typically accepted this mapping but modified the domain boundaries. For other queries where the suggested mapping was wrong or absent, we used HorA server [20] to search for remote homologs. In evaluating HorA results, we applied the same considerations used in developing the ECOD pilot version to determine homology between a query and a hit. When a homologous hit with similar topology could be found, the query was classified into the same T-group as the hit; when a homologous hit with different topology could be found, the query was classified in a new T-group but the same H-group as the hit; when only a possibly homologous hit with similar overall structure could be found, the query was classified in a new H-group but the same X-group as the hit; when no possible homologs can be identified, the query is classified in a new X-group by itself (see Results and Discussion for a description of the ECOD hierarchy). To facilitate manual analysis, we developed a web interface that presented relevant information in a clear format as well as recorded and incorporated feedback and annotations from manual curators. Results/Discussion ECOD is a hierarchical classification of domains based on their evolutionary relationships. Focusing on remote homology, ECOD organizes domains into very broad homologous groups. At the same time, ECOD families address closer evolutionary relationships, detectable at a sequence level. Most importantly, ECOD is comprehensive and up-to-date, including all entries in the PDB and updating weekly, thus uniquely providing researchers with the most current classification of protein domains at both distant and close homology levels. Database Description ECOD is a hierarchical classification with five main levels (Fig. 2, from top to bottom): architecture (A), possible homology (X), homology (H), topology (T), and family (F). The architecture level (A) groups domains with similar secondary structure compositions and geometric shapes. The possible homology level (X) groups domains where some evidence exists to demonstrate homology (but where further evidence is needed). The homology level (H) groups together domains with common ancestry as suggested by high sequence-structure scores, functional similarity, shared unusual features [25], and literature. The topology level (T) groups domains with similar topological connections. The family level (F) groups domains with significant sequence similarity (primarily according to Pfam, secondarily by HHsearch-based clustering). 10.1371/journal.pcbi.1003926.g002 Figure 2 Hierarchical levels of ECOD. Domains placed within the same Architecture share similar secondary structure content (helix, cyan; sheet, yellow) and geometric arrangement. Domains placed within the same X-group share similar structure but lack a convincing argument for homology (vs. analogy), while those placed within the same H-groups are homologous. X- and H- group structures are colored in rainbow by consecutive secondary structure elements. T-groups distinguish homologous domains with notable differences in topology, such as the illustrated Rift-related metafold [18]. Rift-related half-barrels (colored blue and red) are consistent among the domains, but permutations and strand swaps (green) modify the topology. ECOD has 20 architectures that were developed both by consulting SCOP fold descriptions and inspecting numerous structures. We note that clear-cut boundaries between architectures do not always exist and that domain assignment to an architecture is sometimes subjective. This level is introduced largely for convenience of users and does not directly correspond to evolutionary grouping. A-level lies in between SCOP class and fold and groups proteins by simple visual features such as bundles, barrels, meanders, and sandwiches. Coiled-coils, peptides, fragments, largely disordered structures, and low resolution structures were put in special architectures with no X-, H-, T-, or F-levels, as confident evolutionary classification of these structures is challenging at the moment. Nucleic acids, in addition to proteins, are kept within a special architecture and are not currently classified. Within architectures, X-groups are ordered by structural similarity between them. The ECOD X-level groups domains that may be homologous as is frequently suggested by similarity of their spatial structures. A domain's overall structure is traditionally referred to as its ‘fold’. Fold similarity usually refers to general resemblance in both architecture and topology and can result from either common ancestry (homology) or physical/chemical restrictions (analogy) [35],[36],[37]. Both SCOP and CATH have a fold level in the hierarchy: “SCOP fold” and “CATH topology”. However, the definition of fold can be subjective [23], and fold is a geometrical concept without explicit evolutionary meaning. Therefore, ECOD generally avoids the fold concept. However, domains that share strong overall architectural and topological similarity and are possibly homologous, but which lack further evidence to exclude analogy, are attributed to the same X-group but different H-groups. The conceptual difference between ECOD X-group and SCOP fold can be shown, for example, in the classification of domains with a ferredoxin-like topology. In SCOP, the ‘Ferredoxin-like’ fold is a large assembly of various superfamilies that share the (βαβ)×2 topology. Among all these superfamilies, 4Fe-4S ferredoxins seem unique for their small size and cysteine-rich nature (cysteines are used to coordinate the Fe-S clusters). Thus we suspect 4Fe-4S ferredoxins have an independent evolutionary origin and keep 4Fe-4S ferredoxins and other superfamilies in separate X-groups. On the other hand, although domains in the SCOP fold ‘Ribosomal proteins S24e, L23 and L15e’ do not have the ferredoxin-like (βαβ)×2 topology, their structures can easily be transformed into that topology by a circular permutation. Their structural similarity and functional similarity with the ‘RNA-binding domain, RBD’ superfamily in SCOP ‘Ferredoxin-like’ fold may imply homology. Therefore, ECOD classifies ‘Ribosomal proteins S24e, L23 and L15e’ and ‘RNA-binding domain, RBD’ as two H-groups in the same X-group as possible homologs. When further evidence coming either from additional sequences or 3D structures accumulates, classification decisions are adjusted to agree best with all available data. We examined the distribution of domains mapped to SCOP folds and CATH topologies among ECOD X-groups. Of 1,799 ECOD X-groups, 598 include domains from only one SCOP fold and 564 include domains from only one CATH topology, reflecting agreement between classifications for these groups. 89 ECOD X-groups contain domains from multiple SCOP folds and 315 X-groups include domains from multiple CATH topologies. For example, the SCOP folds c.1-TIM beta/alpha-barrel and c.6-7-stranded beta/alpha barrel both contain domains mapped to the ECOD TIM beta/alpha barrel X-group. ECOD unifies such groups due to their shared structural similarity (7- versus 8- stranded) and similar locations of functional sites, but with insufficient evidence of homology to belong to the same H-group. 935 ECOD X-groups are not mapped to any SCOP fold, whereas 1,014 ECOD X-groups are not mapped to any CATH topology. The majority of these unmapped X-groups are simply due to proteins that are not classified by SCOP or CATH (722 and 872 X-groups, respectively); the remainder are shared proteins that are partitioned differently. Taken together, these results suggest that ECOD tends to merge both SCOP folds and CATH topologies into X-groups. An ECOD H-group can contain more distant homologous links than the equivalent SCOP superfamily or CATH homologous superfamily. Although the majority of ECOD H-groups contain only a single SCOP superfamily (88%) or CATH homologous superfamily (81%), some H-groups contain many more (Fig. 3). For example, the Immunoglobulin-related and the Rossmann-related H-groups contain the most SCOP superfamiles (47 and 28, respectively) and CATH homologous superfamilies (81 and 40, respectively). Superfamilies were merged based on multiple high-scoring homologous links between domains. These merges reflect the homology between domain members of these previously split groups. 10.1371/journal.pcbi.1003926.g003 Figure 3 Number of ECOD H-groups containing 1 or more SCOP superfamily (blue) or CATH homologous superfamily(red). The majority contain only a single SCOP superfamily(88%) or CATH homologous superfamily (81%). The most merged (not shown) ECOD H-group is the Immunoglobulin-related domains, which contains 47 SCOP superfamilies and 81 CATH homologous superfamiles. In total, 53 ECOD H-groups contain domains from two or more SCOP folds, and these H-groups contain domains from 151 unique SCOP folds, indicating that fold change in evolution of protein structures is not a very uncommon phenomenon. Similarly, 169 ECOD H-groups contain domains from two or more CATH topologies, and these H-groups contain domains from 357 unique CATH topologies. Additionally, 36 H-groups contain domains mapped to more than one CATH class, indicating homologous domains that nonetheless contain fairly different topologies. To readily incorporate the observation that homologs can adopt different folds, ECOD has a topology (T-) level below the homology (H-) level. As a result, homologs with different topologies that SCOP necessarily separates into different folds (and thus different superfamilies) are unified in the same H-group but different T-groups in ECOD. For example, β-propellers are comprised of differing numbers of repeated β-meanders, all of which are evolutionarily related. The five different beta-propeller folds outlined in SCOP are organized in ECOD into a single H-group, with child T-groups for domains with differing number of blades [38]. Also, the domain contents of 11 SCOP folds are organized into multiple T-groups under the Rift-related H-group in the cradle-loop barrel X-group [39]. If we find sufficient evidence for homology between these proteins this consideration results in merging not only SCOP superfamilies, but also SCOP folds. Within T-groups, ECOD organizes domains into families based on sequence similarity. We employ Pfam as the standard for family definition. ECOD domains were attributed to Pfam families by HMMER3 [31]. Therefore, the majority of ECOD F-groups are simply Pfam families. However, not all protein domains with known structure can be attributed to the current version of Pfam by sequence similarity. Those domains are grouped into families by HHsearch as outlined in Materials and Methods. As a result, ECOD contains 8,947 F-groups, 7,156 of which can be mapped to Pfam families, and 1,622 composed of homologous domains not mapped to any Pfam family. Summary Statistics of ECOD Summary statistics for the ECOD database as of July 31stth, 2013 (version 22b) are presented in Table 1. The majority of the 317,021 domains in ECOD were assigned automatically to a smaller set of 15,969 manually curated domain representatives. Domains in ECOD were derived from five sources: 1) domains originally in SCOP ASTRAL40, inherited and reclassified manually in ECOD (11,462), 2) domains originally in SCOP, but not in the ASTRAL40 set, mapped by MUSCLE alignment with their ASTRAL representative (98,702), 3) novel domains not contained in SCOP, usually from chains deposited to the PDB in the intervening period between the release of SCOP v1.75 and ECOD, manually curated and added to the representative set (4,373), 4) domains automatically added to ECOD by detection of homology by pairwise sequence or structure search (153,381), and 5) domains added to ECOD by MUSCLE alignment of non-representative sequences to closely related ECOD representatives (48,817). The vast majority of domains classified in ECOD have been added by automatic methods. ECOD provides for domains which are assembled from multiple PDB chains, either due to photolytic cleavage (i.e. order-dependent assembly) or obligate multimers (i.e. order-independent assemblies). For order-independent assemblies, we distinguish between those domains where the assembly is primarily relevant for display, or appears to be biologically necessary. These are fairly rare in the database; only 132 representative order-independent assemblies have been defined. At the time of writing, 100% of PDB depositions could be accounted for in the ECOD classification (including those members of the special architectures). 10.1371/journal.pcbi.1003926.t001 Table 1 Summary statistics of ECOD v22b (July 31 2013). Level Population Architectures 20 X-groups 1,799 H-groups 2,279 T-groups 2,865 F-groups 9,013 Manual representatives 15,969 Domains 317,021 95% nonredundant domains 1 50,305 PDB structures 93,663 Peptide chains 239,303 1 domains were filtered using BLASTCLUST with a 95% sequence identity threshold and 90% length cutoff. We also compare ECOD to the most recent releases of SCOP and CATH. ECOD, SCOP, and CATH differ in domain partition strategy, classification hierarchy, and simply in the number of structures considered. At the time of writing, ECOD classifies 93,663 PDB depositions containing 239,303 protein chains, SCOP 1.75 contains 38,221 PDBs and 85,141 chains, and CATH v3.5 contains 51,334 PDBs and 118,792 chains. Of those chains classified in ECOD that are not in SCOP (and not in a special architecture), 137,794 were automatically classified and 2,484 were classified manually. Of those chains classified in ECOD, but not in CATH (and not in a special architecture), 106,474 were automatically classified and 2,521 were classified manually. The growth of the PDB over time is compared to the number of structures classified in ECOD, CATH, and SCOP (Fig. 4(a)). The difference between the number of structures in the PDB and those in the main architectures of ECOD can be primarily accounted for by the number of structures contained in ECOD special architectures (i.e. coiled-coil, peptide, non-peptide polymers, and low-resolution structures that could not be classified by sequence). The growth of the hierarchical levels from 2000–2013 indicates that although evolutionary distinct groups (i.e. X- and H- groups) are being discovered at a steady pace, the predominant source of new domains in ECOD is from sequence families (F-groups) being associated with existing homologous groups (Fig. 4(b)). 10.1371/journal.pcbi.1003926.g004 Figure 4 Classification of ECOD and ECOD hierarchical levels with respect to the PDB and other classifications. A) A cumulative sum of PDB release dates from Jan-2000 to Jan-2014 (red) compared to classified PDB depositions in ECOD (green), SCOP (cyan), and CATH (blue). Any deposition with at least one domain classified is counted. ECOD consistently classifies more structures than SCOP and CATH and is more up-to-date. b) The cumulative sum of PDB deposition dates in ECOD hierarchical levels. Each group is classified once by its oldest deposition. The number of new levels increases consistently over time over the 2000 to 2014 time period. Classification of Weekly PDB Structure Updates Since the July 2013 version, whose statistics are presented here, the subsequent 25 weekly releases by the PDB have been automatically classified (Fig. 5). Each week, protein chains are clustered at 95% redundancy, representatives for those non-redundant chains are classified; those remaining chains are classified when the initial automatic and manual classification pass are completed. For each weekly update, the majority (∼89%) of non-redundant ( 500 95% representative domains) are colored by architecture. The immunoglobulin-related, Rossmann-related, and helix-turn-helix (HTH) H-groups are the most populated H-groups in ECOD. The inset shows the most populated H-groups by number of F-groups. We compared our H-groups to SCOP superfamilies and folds by considering sequence and structure similarity of domain pairs within each level. ECOD manual representatives and ASTRAL40 domains were evaluated by HHsearch to reflect sequence similarity and TMalign to reflect structure similarity [21], [40]. SCOP superfamilies tend to contain more close homologs that can be detected by sequence homology search methods than ECOD H-groups (Fig. 7(c)). Domains classified in SCOP folds (excluding pairs from the same superfamily) emphasize structural similarity, as the distribution is mostly populated in the low sequence similarity region and the peak shifts right compared with others (Fig. 7(a,b)). On the other hand, as ECOD H-group readily incorporates homologous links from SCOP superfamilies and also many remotely homologous relationships that were previously overlooked, its peak sizes lie between SCOP fold and superfamily in high and low sequence similarity regions. Also it is worth noting that the peak of ECOD H-group does not have the right shoulder in the intermediate sequence similarity group but has a relatively evident left shoulder in the high sequence similarity group (Fig. 7(b,c)), which potentially supports the idea that ECOD classification is homology-centric. 10.1371/journal.pcbi.1003926.g007 Figure 7 Structure similarity distribution of domain pairs from SCOP superfamily, SCOP fold and ECOD H-group, measured by TM-score. Data were grouped into three panels by sequence similarity in terms of HHsearch probability (Low: probability ≤20%, Medium: 20%

0 comments Cited 145 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Structure of the parainfluenza virus 5 F protein in its metastable, prefusion conformation

Hsien-Sheng Yin, Xiaolin Wen, Reay G. Paterson … (2006)

Enveloped viruses have evolved complex glycoprotein machinery that drives the fusion of viral and cellular membranes, permitting entry of the viral genome into the cell. For the paramyxoviruses, the fusion (F) protein catalyses this membrane merger and entry step, and it has been postulated that the F protein undergoes complex refolding during this process. Here we report the crystal structure of the parainfluenza virus 5 F protein in its prefusion conformation, stabilized by the addition of a carboxy-terminal trimerization domain. The structure of the F protein shows that there are profound conformational differences between the pre- and postfusion states, involving transformations in secondary and tertiary structure. The positions and structural transitions of key parts of the fusion machinery, including the hydrophobic fusion peptide and two helical heptad repeat regions, clarify the mechanism of membrane fusion mediated by the F protein. Supplementary information The online version of this article (doi:10.1038/nature04322) contains supplementary material, which is available to authorized users.

0 comments Cited 137 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Structural basis for immunization with postfusion respiratory syncytial virus fusion F glycoprotein (RSV F) to elicit high neutralizing antibody titers.

Andrea Carfi, Sumit Dey, René Mandl … (2011)

Respiratory syncytial virus (RSV), the main cause of infant bronchiolitis, remains a major unmet vaccine need despite more than 40 years of vaccine research. Vaccine candidates based on a chief RSV neutralization antigen, the fusion (F) glycoprotein, have foundered due to problems with stability, purity, reproducibility, and potency. Crystal structures of related parainfluenza F glycoproteins have revealed a large conformational change between the prefusion and postfusion states, suggesting that postfusion F antigens might not efficiently elicit neutralizing antibodies. We have generated a homogeneous, stable, and reproducible postfusion RSV F immunogen that elicits high titers of neutralizing antibodies in immunized animals. The 3.2-Å X-ray crystal structure of this substantially complete RSV F reveals important differences from homology-based structural models. Specifically, the RSV F crystal structure demonstrates the exposure of key neutralizing antibody binding sites on the surface of the postfusion RSV F trimer. This unanticipated structural feature explains the engineered RSV F antigen's efficiency as an immunogen. This work illustrates how structural-based antigen design can guide the rational optimization of candidate vaccine antigens.

0 comments Cited 112 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Title: Protein Science

Abbreviated Title: Protein Science

Publisher: Wiley-Blackwell

ISSN: 09618368

Publication date Created: July 2015

Publication date (Print): July 2015

Volume: 24

Issue: 7

Pages: 1075-1086

Article

DOI: 10.1002/pro.2689

PubMed ID: 25970262

SO-VID: 3c23db17-b6f1-4848-ba86-570cd872de29

License:

http://doi.wiley.com/10.1002/tdm_license_1.1

History

Data availability:

ChSeq: A database of chameleon sequences : ChSeq: A Database of Chameleon Sequences

Read this article at

Related collections

ScienceOpen Research

Most cited references 54

ECOD: An Evolutionary Classification of Protein Domains

Structure of the parainfluenza virus 5 F protein in its metastable, prefusion conformation

Structural basis for immunization with postfusion respiratory syncytial virus fusion F glycoprotein (RSV F) to elicit high neutralizing antibody titers.

Author and article information

Journal

Article

History

Comments

Comment on this article

Similar content 2,487

Cited by 16

Most referenced authors 1,408