Chain Alignment of Collagen I Deciphered using Computationally Designed Heterotrimers

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The most abundant member of the collagen protein family, collagen I (COL1), is composed of two similar (chain A) and one unique (chain B) polypeptides that self-assemble with one amino acid offset into a heterotrimeric triple helix. Given the offset, chain B can occupy either the leading (BAA), middle (ABA) or trailing (AAB) position of the triple helix, yielding three isomeric biomacromolecules with different protein recognition properties. Despite five decades of intensive research, there is no consensus on the position of chain B in COL1. Here, three triple-helical heterotrimers that each contain a putative Von Willebrand Factor (VWF) and discoidin domain receptor (DDR) recognition sequence from COL1 were designed with chain B permutated in all three positions. AAB demonstrated a strong preference for both VWF and DDR and also induced higher levels of cellular DDR phosphorylation. Thus, we resolve this long-standing mystery and show that COL1 adopts an AAB register.

Related collections

Most cited references 36

Record: found
Abstract: found
Article: not found

An introduction to data reduction: space-group determination, scaling and intensity statistics

Philip R Evans, Piyush Kalakoti (2011)

1. Introduction Estimates of integrated intensities from X-ray diffraction images are not generally suitable for immediate use in structure determination. Theoretically, the measured intensity I h of a reflection h is proportional to the square of the underlying structure factor |F h |2, which is the quantity that we want, with an associated measurement error, but systematic effects of the diffraction experiment break this proportionality. Such systematic effects include changes in the beam intensity, changes in the exposed volume of the crystal, radiation damage, bad areas of the detector and physical obstruction of the detector (e.g. by the backstop or cryostream). If data from different crystals (or different sweeps of the same crystal) are being merged, corrections must also be applied for changes in exposure time and rotation rate. In order to infer |F h |2 from I h , we need to put the measured intensities on the same scale by modelling the experiment and inverting its effects. This is generally performed in a scaling process that makes the data internally consistent by adjusting the scaling model to minimize the difference between symmetry-related observations. This process requires us to know the point-group symmetry of the diffraction pattern, so we need to determine this symmetry prior to scaling. The scaling process produces an estimate of the intensity of each unique reflection by averaging over all of the corrected intensities, together with an estimate of its error σ(I h ). The final stage in data reduction is estimation of the structure amplitude |F h | from the intensity, which is approximately I h 1/2 (but with a skewing factor for intensities that are below or close to background noise, e.g. ‘negative’ intensities); at the same time, the intensity statistics can be examined to detect pathologies such as twinning. This paper presents a brief overview of how to run CCP4 programs for data reduction through the CCP4 graphical interface ccp4i and points out some issues that need to be considered. No attempt is made to be comprehensive nor to provide full references for everything. Automated pipelines such as xia2 (Winter, 2010 ▶) are often useful and generally work well, but sometimes in difficult cases finer control is needed. In the current version of ccp4i (CCP4 release 6.1.3) the ‘Data Reduction’ module contains two major relevant tasks: ‘Find or Match Laue Group’, which determines the crystal symmetry, and ‘Scale and Merge Intensities’, which outputs a file containing averaged structure amplitudes. Future GUI versions may combine these steps into a simplified interface. Much of the advice given here is also present in the CCP4 wiki (http://www.ccp4wiki.org/). 2. Space-group determination The true space group is only a hypothesis until the structure has been solved, since it can be hard to distinguish between exact crystallographic symmetry and approximate noncrystallographic symmetry. However, it is useful to find the likely symmetry early on in the structure-determination pipeline, since it is required for scaling and indeed may affect the data-collection strategy. The program POINTLESS (Evans, 2006 ▶) examines the symmetry of the diffraction pattern and scores the possible crystallographic symmetry. Indexing in the integration program (e.g. MOSFLM) only indicates the lattice symmetry, i.e. the geometry of the lattice giving constraints on the cell dimensions (e.g. α = β = γ = 90° for an orthorhombic lattice), but such relationships can arise accidentally and may not reflect the true symmetry. For example, a primitive hexagonal lattice may belong to point groups 3, 321, 312, 6, 622 or indeed lower symmetry (C222, 2 or 1). A rotational axis of symmetry produces identical true intensities for reflections related by that axis, so examination of the observed symmetry in the diffraction pattern allows us to determine the likely point group and hence the Laue group (a point group with added Friedel symmetry) and the Patterson group (with any lattice centring): note that the Patterson group is labelled ‘Laue group’ in the output from POINTLESS. Translational symmetry operators that define the space group (e.g. the distinction between a pure dyad and a screw dyad) are only visible in the observed diffraction pattern as systematic absences, along the principal axes for screws, and these are less reliable indicators since there are relatively few axial reflections in a full three-dimensional data set and some of these may be unrecorded. The protocol for determination of space group in POINTLESS is as follows. (i) From the unit-cell dimensions and lattice centring, find the highest compatible lattice symmetry within some tolerance, ignoring any input symmetry information. (ii) Score each potential rotational symmetry element belonging to the lattice symmetry using all pairs of observations related by that element. (iii) Score combinations of symmetry elements for all possible subgroups of the lattice-symmetry group (Laue or Patterson groups). (iv) Score possible space groups from axial systematic absences (the space group is not needed for scaling but is required later for structure solution). (v) Scores for rotational symmetry operations are based on correlation coefficients rather than R factors, since they are less dependent on the unknown scales. A probability is estimated from the correlation coefficient, using equivalent-size samples of unrelated observations to estimate the width of the probability distribution (see Appendix A ). 2.1. A simple example POINTLESS may be run from the ‘Data Reduction’ module of ccp4i with the task ‘Find or Match Laue Group’ or from the ‘QuickSymm’ option of the iMOSFLM interface (Battye et al., 2011 ▶). Unless the space group is known from previous crystals, the appropriate major option is ‘Determine Laue group’. To use this, fill in the boxes for the title, the input and output file names and the project, crystal and data-set names (if not already set in MOSFLM). Table 1 ▶ shows the results for a straightforward example in space group P212121. Table 1 ▶(a) shows the scores for the three possible dyad axes in the orthorhombic lattice, all of which are clearly present. Combining these (Table 1 ▶ b) shows that the Laue group is mmm with a primitive lattice, Patterson group Pmmm. Fourier analysis of systematic absences along the three principal axes shows that all three have alternating strong (even) and weak (odd) intensities (Fig. 1 ▶ and Table 1 ▶ c), so are likely to be screw axes, implying that the space group is P212121. However, there are only three h00 reflections recorded along the a* axis, so confidence in the space-group assignment is not as high as the confidence in the Laue-group assignment (Table 1 ▶ d). With so few observations along this axis, it is impossible to be confident that P212121 is the true space group rather than P22121. 2.2. A pseudo-cubic example Table 2 ▶ shows the scores for individual symmetry elements for a pseudo-cubic case with a ≃ b ≃ c. It is clear that only the orthorhombic symmetry elements are present: these are the high-scoring elements marked ‘***’. Neither the fourfolds characteristic of tetragonal groups nor the body-diagonal threefolds (along 111 etc.) characteristic of cubic groups are present. The joint probability score for the Laue group Pmmm is 0.989. The suggested solution (not shown) interchanges k and l to make a 1 if the anomalous differences are on average greater than their error. Another way of detecting a significant anomalous signal is to compare the two estimates of ΔI anom from random half data sets, ΔI 1 and ΔI 2 (provided there are at least two measurements of each, i.e. a multiplicity of roughly 4). Figs. 5 ▶(b) and 5 ▶(f) show the correlation coefficient between ΔI 1 and ΔI 2 as a function of resolution: Fig. 5 ▶(f) shows little statistically significance beyond about 4.5 Å resolution. Figs. 5 ▶(c) and 5 ▶(g) show scatter plots of ΔI 1 against ΔI 2: this plot is elongated along the diagonal if there is a large anomalous signal and this can be quantitated as the ‘r.m.s. correlation ratio’, which is defined as (root-mean-square deviation along the diagonal)/(root-mean-square deviation perpendicular to the diagonal) and is shown as a function of resolution in Figs. 5 ▶(d) and 5 ▶(h). The plots against resolution give a suggestion of where the data might be cut for substructure determination, but it is important to note that useful albeit weak phase information extends well beyond the point at which these statistics show a significant signal. 5. Estimation of amplitude |F| from intensity I If we knew the true intensity J we could just take the square root, |F| = J 1/2. However, measured intensities have an error, so a weak intensity may well be measured as negative (i.e. below background); indeed, multiple measurements of a true intensity of zero should be equally positive and negative. This is one reason why when possible it is better to use I rather than |F| in structure determination and refinement. The ‘best’ (most likely) estimate of |F| is larger than I 1/2 for weak intensities, since we know |F| > 0, but |F| = I 1/2 is a good estimate for stronger intensities, roughly those with I > 3σ(I). The programs TRUNCATE and its newer version CTRUNCATE estimate |F| from I and σ(I) as where the prior probability of the true intensity p(J) is estimated from the average intensity in the same resolution range (French & Wilson, 1978 ▶). 6. Intensity statistics and crystal pathologies At the end stage of data reduction, after scaling and merging, the distribution of intensities and its variation with resolution can indicate problems with the data, notably twinning (see, for example, Lebedev et al., 2006 ▶; Zwart et al., 2008 ▶). The simplest expected intensity statistics as a function of resolution s = sinθ/λ arise from assuming that atoms are randomly placed in the unit cell, in which case 〈I〉(s) = 〈FF*〉(s) = g(j, s)2, where g(j, s) is the scattering from the jth atom at resolution s. This average intensity falls off with resolution mainly because of atomic motions (B factors). If all atoms were equal and had equal B factors, then 〈I〉(s) = Cexp(−2Bs 2) and the ‘Wilson plot’ of log[〈I〉(s)] against s 2 would be a straight line of slope −2B. The Wilson plot for proteins shows peaks at ∼10 and 4 Å and a dip at ∼6 Å arising from the distribution of interatomic spacings in polypeptides (fewer atoms 6 Å apart than 4 Å apart), but the slope at higher resolution does give an indication of the average B factor and an unusual shape can indicate a problem (e.g. 〈I〉 increasing at the outer limit, spuriously large 〈I〉 owing to ice rings etc.). For detection of crystal pathologies we are not so interested in resolution dependence, so we can use normalized intensities Z = I/〈I〉(s) ≃ |E|2 which are independent of resolution and should ideally be corrected for anisotropy (as is performed in CTRUNCATE). Two useful statistics on Z are plotted by CTRUNCATE: the moments of Z as a function of resolution and its cumulative distribution. While 〈Z〉(s) = 1.0 by definition, its second moment 〈Z 2〉(s) (equivalent to the fourth moment of E) is >1.0 and is larger if the distribution of Z is wider. The ideal value of 〈E 4〉 is 2.0, but it will be smaller for the narrower intensity distribution from a merohedral twin (too few weak reflections), equal to 1.5 for a perfect twin and larger if there are too many weak reflections, e.g. from a noncrystallographic translation which leads to a whole class of reflections being weak. The cumulative distribution plot of N(z), the fraction of reflections with Z |L| and N(|L|) = |L|(3 − L 2)/2 for a perfect twin. This test seems to be largely unaffected by anisotropy or translational noncrystallographic symmetry which may affect tests on Z. The calculation of Z = I/〈I〉(s) depends on using a suitable value for I/〈I〉(s) and noncrystallographic translations or uncorrected anisotropy lead to the use of an inappropriate value for 〈I〉(s). These statistical tests are all unweighted, so it may be better to exclude weak high-resolution data or to examine the resolution dependence of, for example, the moments of Z (or possibly L). It is also worth noting that fewer weak reflections than expected may arise from unresolved closely spaced spots along a long real-space axis, so that weak reflections are contaminated by neighbouring strong reflections, thus mimicking the effect of twinning. 7. Summary: questions and decisions In the process of data reduction, a number of decisions need to be taken either by the programs or by the user. The main questions and considerations are as follows. (i) What is the point group or Laue group? This is usually unambiguous, but pseudosymmetry may confuse the programs and the user. Close examination of the scores for individual symmetry elements from POINTLESS may suggest lower symmetry groups to try. (ii) What is the space group? Distinction between screw axes and pure rotations from axial systematic absences is often unreliable and it is generally a good idea to try all the likely space groups (consistent with the Laue group) in the key structure-solution step: either molecular-replacement searches or substructure searches in experimental phasing. For example, in a primitive orthorhombic system the eight possible groups P2 x 2 x 2 x should be tried. This has the added advantage of providing some negative controls on the success of the structure solution. (iii) Is there radiation damage: should data collected after the crystal has had a high dose of radiation be ignored (possibly at the expense of resolution)? Cutting back data from the end may reduce completeness and the optimum trade-off is hard to choose. (iv) What is the best resolution cutoff? An appropriate choice of resolution cutoff is difficult and sometimes seems to be performed mainly to satisfy referees. On the one hand, cutting back too far risks excluding data that do contain some useful information. On the other hand, extending the resolution further makes all statistics look worse and may in the end degrade maps. The choice is perhaps not as important as is sometimes thought: maps calculated with slightly different resolution cutoffs are almost indistinguishable. (v) Is there an anomalous signal detectable in the intensity statistics? Note that a weak anomalous signal may still be useful even if it is not detectable in the statistics. The statistics do give a good guide to a suitable resolution limit for location of the substructure, but the whole resolution range should be used in phasing. (vi) Are the data twinned? Highly twinned data sets can be solved by molecular replacement and refined, but probably not solved, by experimental phasing methods. Partially twinned data sets can often be solved by ignoring the twinning and then refined as a twin. (vii) Is this data set better or worse than those previously collected? One of the best things to do with a bad data set is to throw it away in favour of a better one. With modern synchrotrons, data collection is so fast that we usually have the freedom to collect data from several equivalent crystals and choose the best. In most cases the data-reduction process is straightforward, but in difficult cases critical examination of the results may make the difference between solving and not solving the structure.

0 comments Cited 510 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Dual-surface modification of the tobacco mosaic virus.

T Schlick, Z Ding, W Kovacs … (2005)

The protein shell of the tobacco mosaic virus (TMV) provides a robust and practical tubelike scaffold for the preparation of nanoscale materials. To expand the range of applications for which the capsid can be used, two synthetic strategies have been developed for the attachment of new functionality to either the exterior or the interior surface of the virus. The first of these is accomplished using a highly efficient diazonium coupling/oxime formation sequence, which installs >2000 copies of a material component on the capsid exterior. Alternatively, the inner cavity of the tube can be modified by attaching amines to glutamic acid side chains through a carbodiimide coupling reaction. Both of these reactions have been demonstrated for a series of substrates, including biotin, chromophores, and crown ethers. Through the attachment of PEG polymers to the capsid exterior, organic-soluble TMV rods have been prepared. Finally, the orthogonality of these reactions has been demonstrated by installing different functional groups on the exterior and interior surfaces of the same capsid assemblies.

0 comments Cited 89 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Discoidin domain receptor 1 tyrosine kinase has an essential role in mammary gland development.

W Bruce Vogel, Catherine Pawson, A Aszódi … (2001)

Various types of collagen have been identified as potential ligands for the two mammalian discoidin domain receptor tyrosine kinases, DDR1 and DDR2. Here, we used a recombinant fusion protein between the extracellular domain of DDR1 and alkaline phosphatase to detect specific receptor binding sites during mouse development. Major sites of DDR1-binding activity, indicative of ligand expression, were found in skeletal bones, the skin, and the urogenital tract. Ligand expression in the uterus during implantation and in the mammary gland during pregnancy colocalized with the expression of the DDR1 receptor. The generation of DDR1-null mice by gene targeting yielded homozygous mutant animals that were viable but smaller in size than control littermates. The majority of mutant females were unable to bear offspring due to a lack of proper blastocyst implantation into the uterine wall. When implantation did occur, the mutant females were unable to lactate. Histological analysis showed that the alveolar epithelium failed to secrete milk proteins into the lumen of the mammary gland. The lactational defect appears to be caused by hyperproliferation and abnormal branching of mammary ducts. These results suggest that DDR1 is a key mediator of the stromal-epithelial interaction during ductal morphogenesis in the mammary gland.

0 comments Cited 61 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-journal-id): 101231976

Journal ID (nlm-ta): Nat Chem Biol

Journal ID (iso-abbrev): Nat. Chem. Biol.

Title: Nature chemical biology

ISSN (Print): 1552-4450

ISSN (Electronic): 1552-4469

Publication date Nihms-submitted: 21 November 2019

Publication date (Electronic): 06 January 2020

Publication date (Print): April 2020

Publication date PMC-release: 06 July 2020

Volume: 16

Issue: 4

Pages: 423-429

Affiliations

[1 ]Department of Biochemistry, University of Cambridge, Cambridge, UK

[2 ]National Heart and Lung Institute, Imperial College London, London, UK

[3 ]Department of Chemistry and Bioengineering, Rice University, Houston, USA

Author notes

[* ] Correspondence and request for materials should be addressed to A.A.J. jalan@ 123456cantab.net

[5]

Present address: Department of Biochemistry, University of Bayreuth, Bayreuth, Germany

Article

Manuscript ID: EMS84997

DOI: 10.1038/s41589-019-0435-y

PMC ID: 7100791

PubMed ID: 31907373

SO-VID: 7fa69bbf-9e5d-4000-a9a2-de0522477dd0

License:

Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms

Chain Alignment of Collagen I Deciphered using Computationally Designed Heterotrimers

Read this article at

Abstract

Related collections

Drug_transporters

Most cited references 36

An introduction to data reduction: space-group determination, scaling and intensity statistics

Dual-surface modification of the tobacco mosaic virus.

Discoidin domain receptor 1 tyrosine kinase has an essential role in mammary gland development.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 168

Cited by 8

Most referenced authors 1,260