25
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      High-resolution structure determination by continuous rotation data collection in MicroED

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          MicroED uses very small three-dimensional protein crystals and electron diffraction for structure determination. An improved data collection protocol for MicroED called “continuous rotation” is presented. Here microcrystals are continuously rotated during data collection yielding improved data, and allowing data processing with MOSFLM resulting in improved resolution for the model protein lysozyme. These improvements pave the way for the implementation and application of MicroED with wide applicability in structural biology.

          Related collections

          Most cited references17

          • Record: found
          • Abstract: found
          • Article: not found

          An introduction to data reduction: space-group determination, scaling and intensity statistics

          1. Introduction Estimates of integrated intensities from X-ray diffraction images are not generally suitable for immediate use in structure determination. Theoretically, the measured intensity I h of a reflection h is proportional to the square of the underlying structure factor |F h |2, which is the quantity that we want, with an associated measurement error, but systematic effects of the diffraction experiment break this proportionality. Such systematic effects include changes in the beam intensity, changes in the exposed volume of the crystal, radiation damage, bad areas of the detector and physical obstruction of the detector (e.g. by the backstop or cryostream). If data from different crystals (or different sweeps of the same crystal) are being merged, corrections must also be applied for changes in exposure time and rotation rate. In order to infer |F h |2 from I h , we need to put the measured intensities on the same scale by modelling the experiment and inverting its effects. This is generally performed in a scaling process that makes the data internally consistent by adjusting the scaling model to minimize the difference between symmetry-related observations. This process requires us to know the point-group symmetry of the diffraction pattern, so we need to determine this symmetry prior to scaling. The scaling process produces an estimate of the intensity of each unique reflection by averaging over all of the corrected intensities, together with an estimate of its error σ(I h ). The final stage in data reduction is estimation of the structure amplitude |F h | from the intensity, which is approximately I h 1/2 (but with a skewing factor for intensities that are below or close to background noise, e.g. ‘negative’ intensities); at the same time, the intensity statistics can be examined to detect pathologies such as twinning. This paper presents a brief overview of how to run CCP4 programs for data reduction through the CCP4 graphical interface ccp4i and points out some issues that need to be considered. No attempt is made to be comprehensive nor to provide full references for everything. Automated pipelines such as xia2 (Winter, 2010 ▶) are often useful and generally work well, but sometimes in difficult cases finer control is needed. In the current version of ccp4i (CCP4 release 6.1.3) the ‘Data Reduction’ module contains two major relevant tasks: ‘Find or Match Laue Group’, which determines the crystal symmetry, and ‘Scale and Merge Intensities’, which outputs a file containing averaged structure amplitudes. Future GUI versions may combine these steps into a simplified interface. Much of the advice given here is also present in the CCP4 wiki (http://www.ccp4wiki.org/). 2. Space-group determination The true space group is only a hypo­thesis until the structure has been solved, since it can be hard to distinguish between exact crystallographic symmetry and approximate noncrystallographic symmetry. However, it is useful to find the likely symmetry early on in the structure-determination pipeline, since it is required for scaling and indeed may affect the data-collection strategy. The program POINTLESS (Evans, 2006 ▶) examines the symmetry of the diffraction pattern and scores the possible crystallographic symmetry. Indexing in the integration program (e.g. MOSFLM) only indicates the lattice symmetry, i.e. the geometry of the lattice giving constraints on the cell dimensions (e.g. α = β = γ = 90° for an orthorhombic lattice), but such relationships can arise accidentally and may not reflect the true symmetry. For example, a primitive hexagonal lattice may belong to point groups 3, 321, 312, 6, 622 or indeed lower symmetry (C222, 2 or 1). A rotational axis of symmetry produces identical true intensities for reflections related by that axis, so examination of the observed symmetry in the diffraction pattern allows us to determine the likely point group and hence the Laue group (a point group with added Friedel symmetry) and the Patterson group (with any lattice centring): note that the Patterson group is labelled ‘Laue group’ in the output from POINTLESS. Translational symmetry operators that define the space group (e.g. the distinction between a pure dyad and a screw dyad) are only visible in the observed diffraction pattern as systematic absences, along the principal axes for screws, and these are less reliable indicators since there are relatively few axial reflections in a full three-dimensional data set and some of these may be unrecorded. The protocol for determination of space group in POINTLESS is as follows. (i) From the unit-cell dimensions and lattice centring, find the highest compatible lattice symmetry within some tolerance, ignoring any input symmetry information. (ii) Score each potential rotational symmetry element belonging to the lattice symmetry using all pairs of observations related by that element. (iii) Score combinations of symmetry elements for all possible subgroups of the lattice-symmetry group (Laue or Patterson groups). (iv) Score possible space groups from axial systematic absences (the space group is not needed for scaling but is required later for structure solution). (v) Scores for rotational symmetry operations are based on correlation coefficients rather than R factors, since they are less dependent on the unknown scales. A probability is estimated from the correlation coefficient, using equivalent-size samples of unrelated observations to estimate the width of the probability distribution (see Appendix A ). 2.1. A simple example POINTLESS may be run from the ‘Data Reduction’ module of ccp4i with the task ‘Find or Match Laue Group’ or from the ‘QuickSymm’ option of the iMOSFLM interface (Battye et al., 2011 ▶). Unless the space group is known from previous crystals, the appropriate major option is ‘Determine Laue group’. To use this, fill in the boxes for the title, the input and output file names and the project, crystal and data-set names (if not already set in MOSFLM). Table 1 ▶ shows the results for a straightforward example in space group P212121. Table 1 ▶(a) shows the scores for the three possible dyad axes in the orthorhombic lattice, all of which are clearly present. Combining these (Table 1 ▶ b) shows that the Laue group is mmm with a primitive lattice, Patterson group Pmmm. Fourier analysis of systematic absences along the three principal axes shows that all three have alternating strong (even) and weak (odd) intensities (Fig. 1 ▶ and Table 1 ▶ c), so are likely to be screw axes, implying that the space group is P212121. However, there are only three h00 reflections recorded along the a* axis, so confidence in the space-group assignment is not as high as the confidence in the Laue-group assignment (Table 1 ▶ d). With so few observations along this axis, it is impossible to be confident that P212121 is the true space group rather than P22121. 2.2. A pseudo-cubic example Table 2 ▶ shows the scores for individual symmetry elements for a pseudo-cubic case with a ≃ b ≃ c. It is clear that only the orthorhombic symmetry elements are present: these are the high-scoring elements marked ‘***’. Neither the fourfolds characteristic of tetragonal groups nor the body-diagonal threefolds (along 111 etc.) characteristic of cubic groups are present. The joint probability score for the Laue group Pmmm is 0.989. The suggested solution (not shown) interchanges k and l to make a 1 if the anomalous differences are on average greater than their error. Another way of detecting a significant anomalous signal is to compare the two estimates of ΔI anom from random half data sets, ΔI 1 and ΔI 2 (provided there are at least two measurements of each, i.e. a multiplicity of roughly 4). Figs. 5 ▶(b) and 5 ▶(f) show the correlation coefficient between ΔI 1 and ΔI 2 as a function of resolution: Fig. 5 ▶(f) shows little statistically significance beyond about 4.5 Å resolution. Figs. 5 ▶(c) and 5 ▶(g) show scatter plots of ΔI 1 against ΔI 2: this plot is elongated along the diagonal if there is a large anomalous signal and this can be quantitated as the ‘r.m.s. correlation ratio’, which is defined as (root-mean-square deviation along the diagonal)/(root-mean-square deviation perpendicular to the diagonal) and is shown as a function of resolution in Figs. 5 ▶(d) and 5 ▶(h). The plots against resolution give a suggestion of where the data might be cut for substructure determination, but it is important to note that useful albeit weak phase information extends well beyond the point at which these statistics show a significant signal. 5. Estimation of amplitude |F| from intensity I If we knew the true intensity J we could just take the square root, |F| = J 1/2. However, measured intensities have an error, so a weak intensity may well be measured as negative (i.e. below background); indeed, multiple measurements of a true intensity of zero should be equally positive and negative. This is one reason why when possible it is better to use I rather than |F| in structure determination and refinement. The ‘best’ (most likely) estimate of |F| is larger than I 1/2 for weak intensities, since we know |F| > 0, but |F| = I 1/2 is a good estimate for stronger intensities, roughly those with I > 3σ(I). The programs TRUNCATE and its newer version CTRUNCATE estimate |F| from I and σ(I) as where the prior probability of the true intensity p(J) is estimated from the average intensity in the same resolution range (French & Wilson, 1978 ▶). 6. Intensity statistics and crystal pathologies At the end stage of data reduction, after scaling and merging, the distribution of intensities and its variation with resolution can indicate problems with the data, notably twinning (see, for example, Lebedev et al., 2006 ▶; Zwart et al., 2008 ▶). The simplest expected intensity statistics as a function of resolution s = sinθ/λ arise from assuming that atoms are randomly placed in the unit cell, in which case 〈I〉(s) = 〈FF*〉(s) = g(j, s)2, where g(j, s) is the scattering from the jth atom at resolution s. This average intensity falls off with resolution mainly because of atomic motions (B factors). If all atoms were equal and had equal B factors, then 〈I〉(s) = Cexp(−2Bs 2) and the ‘Wilson plot’ of log[〈I〉(s)] against s 2 would be a straight line of slope −2B. The Wilson plot for proteins shows peaks at ∼10 and 4 Å and a dip at ∼6 Å arising from the distribution of inter­atomic spacings in polypeptides (fewer atoms 6 Å apart than 4 Å apart), but the slope at higher resolution does give an indication of the average B factor and an unusual shape can indicate a problem (e.g. 〈I〉 increasing at the outer limit, spuriously large 〈I〉 owing to ice rings etc.). For detection of crystal pathologies we are not so interested in resolution dependence, so we can use normalized intensities Z = I/〈I〉(s) ≃ |E|2 which are independent of resolution and should ideally be corrected for anisotropy (as is performed in CTRUNCATE). Two useful statistics on Z are plotted by CTRUNCATE: the moments of Z as a function of resolution and its cumulative distribution. While 〈Z〉(s) = 1.0 by definition, its second moment 〈Z 2〉(s) (equivalent to the fourth moment of E) is >1.0 and is larger if the distribution of Z is wider. The ideal value of 〈E 4〉 is 2.0, but it will be smaller for the narrower intensity distribution from a merohedral twin (too few weak reflections), equal to 1.5 for a perfect twin and larger if there are too many weak reflections, e.g. from a noncrystallographic translation which leads to a whole class of reflections being weak. The cumulative distribution plot of N(z), the fraction of reflections with Z |L| and N(|L|) = |L|(3 − L 2)/2 for a perfect twin. This test seems to be largely unaffected by anisotropy or translational non­crystallographic symmetry which may affect tests on Z. The calculation of Z = I/〈I〉(s) depends on using a suitable value for I/〈I〉(s) and noncrystallographic translations or uncorrected anisotropy lead to the use of an inappropriate value for 〈I〉(s). These statistical tests are all unweighted, so it may be better to exclude weak high-resolution data or to examine the resolution dependence of, for example, the moments of Z (or possibly L). It is also worth noting that fewer weak reflections than expected may arise from unresolved closely spaced spots along a long real-space axis, so that weak reflections are contaminated by neighbouring strong reflections, thus mimicking the effect of twinning. 7. Summary: questions and decisions In the process of data reduction, a number of decisions need to be taken either by the programs or by the user. The main questions and con­siderations are as follows. (i) What is the point group or Laue group? This is usually unambiguous, but pseudosymmetry may confuse the programs and the user. Close examination of the scores for individual symmetry elements from POINTLESS may suggest lower symmetry groups to try. (ii) What is the space group? Distinction between screw axes and pure rotations from axial systematic absences is often unreliable and it is generally a good idea to try all the likely space groups (consistent with the Laue group) in the key structure-solution step: either molecular-replacement searches or substructure searches in experimental phasing. For example, in a primitive orthorhombic system the eight possible groups P2 x 2 x 2 x should be tried. This has the added advantage of providing some negative controls on the success of the structure solution. (iii) Is there radiation damage: should data collected after the crystal has had a high dose of radiation be ignored (possibly at the expense of resolution)? Cutting back data from the end may reduce completeness and the optimum trade-off is hard to choose. (iv) What is the best resolution cutoff? An appropriate choice of resolution cutoff is difficult and sometimes seems to be performed mainly to satisfy referees. On the one hand, cutting back too far risks excluding data that do contain some useful information. On the other hand, extending the resolution further makes all statistics look worse and may in the end degrade maps. The choice is perhaps not as important as is sometimes thought: maps calculated with slightly different resolution cutoffs are almost indistinguishable. (v) Is there an anomalous signal detectable in the intensity statistics? Note that a weak anomalous signal may still be useful even if it is not detectable in the statistics. The statistics do give a good guide to a suitable resolution limit for location of the substructure, but the whole resolution range should be used in phasing. (vi) Are the data twinned? Highly twinned data sets can be solved by molecular replacement and refined, but probably not solved, by experimental phasing methods. Partially twinned data sets can often be solved by ignoring the twinning and then refined as a twin. (vii) Is this data set better or worse than those previously collected? One of the best things to do with a bad data set is to throw it away in favour of a better one. With modern synchrotrons, data collection is so fast that we usually have the freedom to collect data from several equivalent crystals and choose the best. In most cases the data-reduction process is straightforward, but in difficult cases critical examination of the results may make the difference between solving and not solving the structure.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            "Ab initio" structure solution from electron diffraction data obtained by a combination of automated diffraction tomography and precession technique.

            Using a combination of our recently developed automated diffraction tomography (ADT) module with precession electron technique (PED), quasi-kinematical 3D diffraction data sets of an inorganic salt (BaSO(4)) were collected. The lattice cell parameters and their orientation within the data sets were found automatically. The extracted intensities were used for "ab initio" structure analysis by direct methods. The data set covered almost the complete set of possible symmetrically equivalent reflections for an orthorhombic structure. The structure solution in one step delivered all heavy (Ba, S) as well as light atoms (O). Results of the structure solution using direct methods, charge flipping and maximum entropy algorithms as well as structure refinement for three different 3D electron diffraction data sets were presented.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Electron-crystallographic refinement of the structure of bacteriorhodopsin.

              Using electron diffraction data corrected for diffuse scattering together with additional phase information from 30 new images of tilted specimens, an improved experimental density map has been calculated for bacteriorhodopsin. The atomic model has then been rebuilt into this new map with particular attention to the surface loops. All the residues from 7 to 227 as well as ten lipid molecules are now included, although a few amino acid residues in three of the six surface loops, about half of the lipid hydrophobic chains and all of the lipid head groups are disordered. The model has then been refined against the experimental diffraction amplitudes to an R-factor of 28% at 3.5 angstrom resolution with strict geometry (0.005 angstrom) bond length deviation) using the improvement of the "free" phase residual between calculated and experimental phases from images as an objective criterion of accuracy. For the refinement some new programs were developed to restrain the number of parameters, to be compatible with the limited resolution of our data. In the final refined model of the protein (2BRD), compared with earlier co-ordinates (1BRD), helix D has been moved towards the cytoplasm by almost 4 angstrom, and the overall accuracy of the co-ordinates of residues in the other six helices has been improved. As a result the positions of nearly all the important residues in bacteriorhodopsin are now well determined. In particular, the buried, protonated Asp115 is 7 angstrom from, and so not in contact with, the retinal and Met118 forms a cap on the pocket occupied by the beta-ionone ring. No clear density exists for the side-chain of Arg82, which forms a central part of the extracellular half-channel. The only arginine side-chain built into good density is that of Arg134 at the extracellular end of helix E, the others being disordered near one of the two surfaces. The interpretation of the end of helix F on the extracellular surface is now clearer; an extra loose helical turn has been built bringing the side-chain of Glu194 close to Arg134 to form a probable salt bridge. The model provides an improved framework for understanding the mechanism of the light-driven proton pumping. A number of cavities that could contain water molecules were found by searching the refined model, most of them above or below the Schiff base in the half-channels leading to the two surfaces. The ordered and disordered regions of the structure are described by the temperature factor distribution.
                Bookmark

                Author and article information

                Journal
                101215604
                32338
                Nat Methods
                Nat. Methods
                Nature methods
                1548-7091
                1548-7105
                23 July 2014
                03 August 2014
                September 2014
                01 March 2015
                : 11
                : 9
                : 927-930
                Affiliations
                [1 ]Janelia Research Campus, Howard Hughes Medical Institute, Ashburn VA, USA
                [2 ]Medical Research Council Laboratory of Molecular Biology, Cambridge, UK.
                Author notes
                Corresponding author: Tamir Gonen, PhD Janelia Research Campus, Howard Hughes Medical Institute, 19700 Helix Drive, Ashburn, VA 20147, USA Ph 571.209.4261 Fax 571.291.6449
                [* ]To whom correspondence may be addressed. gonent@ 123456janelia.hhmi.org

                AUTHOR CONTRIBUTIONS B.L.N contributed to project design, conception, data collection, data analysis, manuscript writing, figure making.

                D.S contributed to project design, conception, data collection, data analysis, manuscript writing, figure making.

                A.G.W.L contributed to data processing and analysis in MOSFLM and manuscript writing.

                T.G. contributed to project design, conception, data analysis, and manuscript writing.

                Article
                EMS59497
                10.1038/nmeth.3043
                4149488
                25086503
                129a8cec-5806-496c-a7b3-697641d6e52b
                History
                Categories
                Article

                Life sciences
                Life sciences

                Comments

                Comment on this article