+1 Recommend
1 collections
      • Record: found
      • Abstract: found
      • Article: not found

      Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          The CRISPR-associated protein Cas9 is an RNA-guided endonuclease that cleaves double-stranded DNAs bearing sequences complementary to a 20-nucleotide segment in the guide RNA 1, 2 . Cas9 has emerged as a versatile molecular tool for genome editing and gene expression control 3 . RNA-guided DNA recognition and cleavage strictly require the presence of a protospacer adjacent motif (PAM) in the target DNA 1, 4- 6 . Here, we report a crystal structure of Streptococcus pyogenes Cas9 complexed with a single-molecule guide RNA (sgRNA) and a target DNA containing a canonical 5′-NGG-3′ PAM. The structure reveals that the PAM motif resides in a base-paired DNA duplex. The non-complementary strand GG dinucleotide is read out via major groove interactions with conserved arginine residues from the C-terminal domain of Cas9. Interactions with the minor groove of the PAM duplex and the phosphodiester group at the +1 position in the target DNA strand contribute to local strand separation of the target DNA duplex immediately upstream of the PAM. These observations suggest a mechanism for PAM-dependent target DNA melting and RNA-DNA hybrid formation. Furthermore, this study establishes a framework for the rational engineering of Cas9 enzymes with novel PAM specificities.

          Related collections

          Most cited references 42

          • Record: found
          • Abstract: found
          • Article: not found

          Coot: model-building tools for molecular graphics.

          CCP4mg is a project that aims to provide a general-purpose tool for structural biologists, providing tools for X-ray structure solution, structure comparison and analysis, and publication-quality graphics. The map-fitting tools are available as a stand-alone package, distributed as 'Coot'.
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            PHENIX: a comprehensive Python-based system for macromolecular structure solution

            1. Foundations 1.1. PHENIX architecture The PHENIX (Adams et al., 2002 ▶) architecture is designed from the ground up as a hybrid system of tightly integrated interpreted (‘scripted’) and compiled software modules. A mix of scripted and compiled components is invariably found in all major successful crystallographic packages, but often the scripting is added as an afterthought in an ad hoc fashion using tools that predate the object-oriented programming era. While such ad hoc systems are quickly established, they tend to become a severe maintenance burden as they grow. In addition, users are often forced into many time-consuming routine tasks such as manually converting file formats. In PHENIX, the scripting layer is the heart of the system. With only a few exceptions, all major functionality is implemented as modules that are exclusively accessed via the scripting interfaces. The object-oriented Python scripting language (Lutz & Ascher, 1999 ▶) is used for this purpose. In about two decades, a large developer/user community has produced millions of lines of highly uniform, interoperable, mature and openly available sources covering all aspects of programming ranging from simple file handling to highly sophisticated network communication and fully featured cross-platform graphical interfaces. Embedding crystallographic methods into this environment enables an unprecedented degree of automation, stability and portability. By design, the object-oriented programming model fosters shared collaborative development by multiple groups. It is routine practice to hierarchically recombine modules written by different groups into ever more complex procedures that appear uniform from the outside. A more detailed overview of the key software technology leading to all these advances, presented in the context of crystallography, can be found in Grosse-Kunstleve et al. (2002 ▶). In addition to the advantages outlined in the previous paragraph, the scripting language is generally most efficient for the rapid development of new algorithms. However, run­time performance considerations often dictate that numerically intensive calculations are eventually implemented in a compiled language. The first choice of a compiled language is of course to reuse the same language environment as used for the scripting language itself, which is a C/C++ environment. Not only is this the mainstream software environment on all major platforms used today, but with probably hundreds of millions of lines of C/C++ sources in existence it is an environment that is virtually guaranteed to thrive in the long term. An in-depth discussion of the combined use of Python and C++ can be found in Grosse-Kunstleve et al. (2002 ▶) and Abrahams & Grosse-Kunstleve (2003 ▶). This model is used throughout the PHENIX system. 1.2. Graphical user interface A new graphical user interface (GUI) for PHENIX was introduced in version 1.4. It uses the open-source wxPython toolkit, which provides a ‘native’ look on each operating system. Development has focused on providing interfaces around the existing command-line programs with minimal modification, using the same underlying configuration system (libtbx.phil) as used by most PHENIX programs as a template to automatically generate controls. Because these programs are implemented primarily as Python modules, complex data including models, reflections and other viewable data may be exchanged with the GUI without resorting to parsing log files. The current PHENIX release (version 1.5) includes GUIs for phenix.refine (Afonine et al., 2005 ▶), phenix.xtriage (Zwart et al., 2005 ▶), the AutoSol (Terwilliger et al., 2009 ▶), AutoBuild (Terwilliger, Grosse-Kunstleve, Afonine, Moriarty, Adams et al., 2008 ▶) and LigandFit (Terwilliger et al., 2006 ▶) wizards, the restraints editor REEL, all of the validation tools and several utilities for creating and manipulating maps and reflection files. More recent builds of PHENIX contain a new GUI for the AutoMR wizard and future releases will include a new interface for Phaser (McCoy et al., 2007 ▶). Intrinsically graphical data is visualized with embedded graphs (using the free matplotlib Python library) or a simple OpenGL viewer. This simplifies the most complex parameters, such as atom selections in phenix.refine, which can be visual­ized or picked interactively with the built-in viewer. The GUI also serves as a platform for additional automation and user customization. Similarly to the CCP4 interface (CCP4i; Potterton et al., 2003 ▶), PHENIX manages data and task history for separate user-defined projects. Default parameters and input files can be specified for each project; for instance, the generation of ligand restraints from the phenix.refine GUI gives the user the option of automatically loading these restraints in future runs. The popularity of Python as a scientific programming language has led to its use in many other structural-biology applications, especially molecular-graphics software. The PHENIX GUI includes extension modules for the modeling programs Coot (Emsley & Cowtan, 2004 ▶) and PyMOL (DeLano, 2002 ▶), both of which are controlled remotely from PHENIX using the XML-RPC protocol. This allows the interfaces to integrate seamlessly; any model or map in PHENIX can be automatically opened in Coot with a single click. In programs that iteratively rebuild or refine structures, such as AutoBuild and phenix.refine, the current model and maps will be continually updated in Coot and/or PyMOL as soon as they are available. In the validation utilities, clicking on any atom or residue flagged for poor statistics will recentre the graphics windows on that atom. Remote control of the PHENIX GUI is also simple using the same protocol and simple extensions to the Coot interface provide direct launching of phenix.refine with a model pre-loaded. 2. Analysis of experimental data PHENIX has a range of tools for the analysis, validation and manipulation of X-ray diffraction data. A comprehensive tool for analyzing X-ray diffraction data is phenix.xtriage (Zwart et al., 2005 ▶), which carries out tests ranging from space-group determination and detection of twinning to detection of anomalous signal. These tests provide the user and the various wizards with a set of statistics that characterize a data set. For analysis of twinning, phenix.xtriage consolidates a number of statistics to provide a balanced verdict of possible symmetry and twin-related issues with the data. Phenix.xtriage provides the user with feedback on the overall characteristics of the data. Routine usage of phenix.xtriage during or immediately after data collection has resulted in the timely discovery of twinning or other issues (Flynn et al., 2007 ▶; Kostelecky et al., 2009 ▶). Detection of these idiosyncrasies in the data typically reduces the overall effort in a successful structure determination. A likelihood-based estimation of the overall anisotropic scale factor is performed using the likelihood formalism described by Popov & Bourenkov (2003 ▶). Database-derived standard Wilson plots for proteins and nucleic acids are used to detect anomalies in the mean intensity. These anomalies may arise from ice rings or other issues (Morris et al., 2004 ▶). Data strength and low-resolution completeness are also analysed. The presence of anomalous signal is detected by analysis of the measurability, a quantity expressing the fraction of statistically significant Bijvoet differences in a data set (Zwart, 2005 ▶). The native Patterson function is used to detect the presence of pseudo-translational symmetry. A database-derived empirical distribution of maximum peak heights is used to assign significance to detected peaks in the Patterson function. A comprehensive automated twinning analysis is per­formed. Twin laws are derived from first principles to facilitate the identification of pseudo-merodehral cases. Amplitude and intensity ratios, 〈|E 2 − 1|〉 values, the L-statistic (Padilla & Yeates, 2003 ▶) and N(Z) plots are derived from data cut to the resolution limit suggested by the data-strength analysis. The removal of shells of data with relatively high noise content greatly improves the automated interpretation of these statistics. A Britton plot, H-test and a likelihood-derived approach are used to estimate twin fractions when twin laws are present. If a model has been supplied, an R versus R (Lebedev et al., 2006 ▶) analysis is carried out. This type of analysis is of particular use when dealing with pseudo-symmetry, space-group problems and twinning (Zwart et al., 2008 ▶). To test for inconsistent indexing between different data sets, a set of reindexing laws is derived from first principles given the unit cells and space groups of the sample and reference data sets. A correlation analysis suggests the most likely choice of reindexing of the data. Analysis of the metric symmetry of the unit cell provides a number of likely point groups. A likelihood-inspired method is used to suggest the most likely point group of the data. Subsequent analysis of systematic absences in a likelihood framework ranks subsequent space-group possibilities (details to be published). 3. Substructure determination, phasing and molecular replacement After ensuring that the diffraction data are sound and understood, the next critical necessity for solving a structure is the determination of phases using one of several strategies (Adams, Afonine et al., 2009 ▶). 3.1. Substructure determination The substructure-determination procedure implemented as phenix.hyss (Hybrid Substructure Search; Grosse-Kunstleve & Adams, 2003 ▶) combines the multi-trial dual-space recycling approaches pioneered by Shake-and-Bake (Miller et al., 1994 ▶) and later SHELXD (Sheldrick, 2008 ▶) with the use of the fast translation function (Navaza & Vernoslova, 1995 ▶; Grosse-Kunstleve & Brunger, 1999 ▶). The fast translation function is the basis for a systematic search in the Patterson function (performed in reciprocal space), in contrast to the stochastic alternative of SHELXD (performed in direct space). Phenix.hyss is the only substructure-determination program to fully integrate automatic comparison of the substructures found in multiple trials via a Euclidean Model Matching procedure (part of the cctbx open-source libraries). This allows phenix.hyss to detect if the same solution was found multiple times and to terminate automatically if this is the case. Extensive tests with a variety of SAD data sets (Grosse-Kunstleve & Adams, 2003 ▶) have led to a parameterization of the procedure that balances runtime considerations and the likelihood that repeated solutions present the correct substructure. In many cases the procedure finishes in seconds if the substructure is detectable from the input data. 3.2. Phasing Phaser, available in PHENIX as phenix.phaser, applies the principle of maximum likelihood to solving crystal structures by molecular replacement, by single-wavelength anomalous diffraction (SAD) or by a combination of both. The likelihood targets take proper account of the effects of different sources of error (and, in the case of SAD phasing, their correlations) and allow different sources of information to be combined. In solving a molecular-replacement problem with a number of different components, the information gained from a partial solution increases the signal in the search for subsequent components. Because the likelihood scores for different models can be directly compared, decisions among models can readily be made as part of automation strategies (discussed below). 3.3. Noncrystallographic symmetry (NCS) Noncrystallographic symmetry is an important feature of many macromolecular crystals that can be used to greatly improve electron-density maps. PHENIX has tools for the identification of NCS and for using NCS and multiple crystal forms of a macromolecule in phase improvement. Phenix.find_ncs and phenix.simple_ncs_from_pdb are tools for the identification of noncrystallographic symmetry in a structure using information from a heavy-atom substructure or an atomic model. Phenix.simple_ncs_from_pdb will identify NCS and generate transformations from the chains in a model in a PDB file. Phenix.find_ncs will identify NCS from either a heavy-atom substructure (Terwilliger, 2002a ▶) or the chains in a PDB file and will then compare this NCS with the density in a map to verify that the NCS is actually present. Phenix.multi_crystal_average is a method for combining information from several crystal forms of a structure. It is especially well suited to cases where each crystal form has its own NCS, adjusting phases for each crystal form so that all the NCS copies in all crystals are as similar as possible. NCS restraints should normally be applied in density modification and model building in all cases except where there is clear evidence that NCS is not present. In density modification within PHENIX the presence of NCS is identified from the heavy-atom sites or from an atomic model if available. The local correlation of density in NCS-related locations is then used automatically to set variable restraints on NCS symmetry in the map. In refinement, NCS symmetry is applied through coordinate restraints, targeting the positions of each NCS copy relative to those of the other NCS-related chains. The default NCS restraints in PHENIX are very tight, with targets of 0.05 Å r.m.s. At resolutions lower than about 2.5 Å these tight restraints on NCS should usually be applied. At higher resolutions it may be appropriate to use looser restraints or to remove them altogether. Additionally, if there are segments of the chains that clearly do not obey the NCS relationships they should be excluded from the NCS restraints. Normally this is performed automatically, but it can also be specified explicitly. 4. Model building, ligand fitting and nucleic acids Key steps in the analysis of a macromolecular crystal structure are building an initial core model, identification and fitting of ligands into the electron-density map and building an atomic model for loop regions that are less well defined than the majority of the structure. PHENIX has tools for rapid model building of secondary structure and main-chain tracing (phenix.find_helices_strands) and for the fitting of flexible ligands (phenix.ligandfit) as well as for fitting a set of ligands to a map (phenix.find_all_ligands) and for the identification of ligands in a map (phenix.ligand_identification). PHENIX additionally has a tool for the fitting of missing loops (phenix.fit_loops). Validation tools are provided so that the models produced can be validated at each step along the way. 4.1. Model building Phenix.find_helices_strands will rapidly build a secondary-structure-only model into a map or very rapidly trace the polypeptide backbone of a model into a map. To build secondary structure in a map, phenix.find_helices_strands identifies α-helical regions and β-strand segments, models idealized helices and strands into the corresponding density, allowing for bending of the helices and strands, and assembles these into a composite model. To very rapidly trace the main chain in a map, phenix.find_helices_strands finds points along ridgelines of high density where Cα atoms might be located, identifies pairs and then triplets of these Cα atoms that have density between the atoms and plausible geometry, constructs all possible connections of these Cα atoms into nonamers and then identifies all the longest possible chains that can be made by joining the nonamers. This process can build a Cα model at a rate of about 20 residues per second, yielding a backbone model that can readily be interpreted visually or automatically to evaluate the quality of the map that it is based on. Phenix.fit_loops will fit missing loops in an atomic model. It uses RESOLVE model building (Terwilliger, 2003a ▶,b ▶,c ▶) to extend the chain from either end where a loop is missing and to connect the chains into a loop with the expected number of residues. 4.2. Ligand fitting Phenix.ligandfit is a tool for fitting a flexible ligand into an electron-density map (Terwilliger et al., 2006 ▶). The key approaches used are breaking the ligand into its component rigid-body parts, finding where each of these can be placed into density, tracing the remainder of the ligand based on the positions of these core rigid-body parts and recombining the best parts of multiple fits while scoring based on the fit to the density. Phenix.find_all_ligands is a tool for finding all the instances of each of several ligands in an electron-density map. Phenix.find_all_ligands finds the largest contiguous region of unused density in a map and uses phenix.ligandfit to fit each supplied ligand into that density. It then chooses the ligand that has the highest real-space correlation to the density (Terwilliger, Adams et al., 2007 ▶). It then repeats this process until no ligands can be satisfactorily fitted into any remaining density in the map. Phenix.ligand_identification is a tool for identifying which ligands are compatible with unknown electron density in a map (Terwilliger, Adams et al., 2007 ▶). It can search using the 200 most common ligands from the PDB or from a user-supplied list of ligands. Phenix.ligand_identification uses phenix.ligandfit to fit each ligand to the map and identifies the best-fitting ligand using the real-space correlation and surface complementarity of the ligand and the atoms in the structure surrounding the ligand-binding site. 4.3. RNA and DNA In common with most macromolecular crystallographic tools, PHENIX was originally developed with protein structures primarily in mind. Now that nucleic acids, and especially RNA, are increasingly important in large biological structures, the system is being modified in places where subtle differences in procedure are needed rather than just the relevant libraries. Model building in phenix.autobuild now has a preliminary set of nucleic acid procedures that take advantage of the relatively well determined phosphate and base positions, as well as the preponderance of double helix, and that make use of the RNA backbone conformers recently defined by the RNA Ontology Consortium (Richardson et al., 2008 ▶). Nucleic acid structures benefit significantly from torsion-angle refinement, which has recently been added to the options in phenix.refine. A principal problem in RNA models is getting the ribose pucker correct, although it is known to consist almost entirely of either C3′-endo (which is commoner and that found in the A-form helix) or C2′-endo (Altona & Sundaralingam, 1972 ▶). MolProbity uses the perpendicular distance from the 3′ phosphate to the line of the C1′—N1/9 glycosidic bond as a reliable diagnostic of ribose pucker (Davis et al., 2007 ▶; Chen et al., 2010 ▶). This same test has now been built into phenix.refine to allow the use of pucker-specific target parameters for bond lengths, angles and torsions (Gelbin et al., 1996 ▶) rather than the uneasy compromise values (Parkinson et al., 1996 ▶) used in most pucker-agnostic refinement. Currently, if an incorrect pucker is diagnosed it must usually be fixed by user rebuilding, for instance in Coot (Emsley & Cowtan, 2004 ▶) or in RNABC (Wang et al., 2008 ▶). A rebuilding functionality will probably be incorporated into PHENIX soon, but in the meantime the refinement will now correctly maintain the geometry of a C2′-­endo pucker once it has been built and identified using conformation-specific residue names. 4.4. Maps, models and avoiding bias Phenix.refine (and the graphical tool phenix.create_maps) can produce various types of maps, including anomalous difference, maximum-likelihood weighted (p*mF obs − q*DF model)exp(iαmodel) and regular (p*F obs − q*F model)exp(iαmodel), where p and q are any user-defined numbers, filled and kick maps. The coefficients m and D of likelihood-weighted maps (Read, 1986 ▶) are computed using test-set reflections as described in Lunin & Skovoroda (1995 ▶) and Urzhumtsev et al. (1996 ▶). Data incompleteness, especially systematic incompleteness, can cause map distortions (Lunin, 1988 ▶; Tronrud, 1997 ▶). An approach to remedying this problem is to replace (‘fill’) missing observations with nonzero values. One can use DF model (similarly to REFMAC; Murshudov et al., 1997 ▶) to replace the missing F obs or use 〈F obs〉, where the F obs are averaged across a resolution bin around the missing F obs value. Based on a limited number of tests, both ‘filling’ schemes produce similar results, reiterating the importance of phases. However, it is important to keep in mind that by replacing missing F obs there is a risk of introducing bias and obviously the more incomplete the data is the larger the risk. At present it is advisable to use both maps simultaneously: filled and not filled. An average kick map (AK map; Gunčar et al., 2000 ▶; Turk, 2007 ▶; Pražnikar et al., 2009 ▶) is the result of the following procedure. A large ensemble of structures is created where the coordinates of each structure from the ensemble are all randomly shaken. A map is then computed for each structure. Finally, all maps are averaged to generate one AK map. An AK map is expected to have less bias and less noise and to enhance the existing signal and can potentially clarify some initially bad densities. A computationally intensive but powerful method of creating a very low-bias map is to carry out iterative model building and refinement while omitting one region of the map from all calculations of structure factors (Terwilliger, Grosse-Kunstleve, Afonine, Moriarty, Adams et al., 2008 ▶). The phenix.autobuild iterative-build OMIT map procedure carries this out automatically for either a single OMIT region or for overlapping OMIT regions to create a composite iterative-build OMIT map. 5. Model, and model-to-data, validation The result of crystallographic structure determination is the atomic model. There are three principal components in assessing model quality: the covalent model geometry, the model stereochemistry and the quality of fit between the model and experimental data in both real space and in reciprocal space. All three provide overall measures, and the first two plus the real-space aspect of the third also provide checks for local outliers, which give the best leverage for user intervention to actively improve model accuracy (Arendall et al., 2005 ▶). (Validation of the experimental data was described in §2 above.) PHENIX includes many individual tools for specific aspects of validation, plus several systems that combine those results into overall summaries. Validation is provided both for user evaluation of the progress and results of a structure solution and also to help inform the automated choices made by other parts of the system. Most aspects of the MolProbity model-validation tools (Davis et al., 2007 ▶; Chen et al., 2010 ▶) have been adapted or rewritten for integrated use within PHENIX and are pre­sented to the user by the new GUI (§1.2). H atoms are added by phenix.reduce, with optimization of entire local hydrogen-bond networks, consideration of the first layer of crystallo­graphic waters and optional correction of side-chain amide or histidine 180° ‘flips’ (Word, Lovell, Richardson et al., 1999 ▶). All-atom contacts (Word, Lovell, LaBean et al., 1999 ▶) are calculated by phenix.probe, which provides the atomic overlap information needed for the validation of serious all-atom steric clashes and can also be visualized in Coot. For the PHENIX GUI, the set of MolProbity-based tools provides both overall model statistics, such as clashscore and percentage of outliers, and detailed lists of the Ramachandran (Lovell et al., 2003 ▶), rotamer (Lovell et al., 2000 ▶), Cβ deviation (Lovell et al., 2003 ▶) and clash outliers. Command-line tools are available for these validation methods: phenix.rotalyze, phenix.ramalyze, phenix.cbetadev, phenix.clashscore, phenix.reduce and phenix.probe. Additionally, phenix.validate_model, which analyzes the deviations of bond lengths, bond angles, planarity etc. from ideal library values, complements the MolProbity torsional and atomic clash tools. Phenix.real_space_correlation asserts the local model-to-data correspondence by providing a quantitative measure of how the atomic model fits the electron-density map at the residue or atom level (depending on the resolution). Rapidly obtaining a snapshot of global figures of merit for a crystallo­graphic model and associated experimental data is a frequent task that is performed at all stages of structure solution. This task can be complicated for several reasons: the presence of novel ligands or nonstandard residues in the PDB-format (Berman et al., 2000 ▶) coordinate file, data collected from twinned crystals, various reflection datafile formats, different representation of atomic displacement parameters in the presence of TLS (Schomaker & Trueblood, 1968 ▶), experimental data type (X-­ray and/or neutron), files with multiple models and various formatting issues. Phenix.model_vs_data is designed to automatically handle all these complications with minimal user input (a PDB file and a reflection data file) and provide a concise summary output. Phenix.polygon (Urzhumtseva et al., 2009 ▶) is a graphical tool that is designed to indicate the similarity of validation parameters, such as free R value, for a particular structure compared with those deposited in the PDB. This comparison is performed for all other structures solved at similar resolution limits. The result is presented graphically. Phenix.validation combines all of the tools described above in one GUI, providing a single place for assessing the results of structure determination. 5.1. Model and structure-factor manipulation and analysis PHENIX has a range of tools for displaying, analyzing and manipulating structure-factor and model information. Phenix.mtz.dump and phenix.cif_as_mtz display and convert structure-factor data. Phenix.print_sequence, phenix.pdb_atom_selection and phenix.pdbtools display and manipulate coordinate files. Phenix.tls is a tool for the extraction and manipulation of TLS information. Using this tool, TLS matrices and selections can be extracted from REFMAC- or PHENIX-formatted PDB file headers and the total or residual atomic B factors can be computed and output. Future functionality will include the complete analysis of TLS matrices and their graphical visual­ization. Phenix.get_cc_mtz_mtz and phenix.get_cc_mtz_pdb are tools for analyzing the agreement between maps based on a pair of MTZ files or between maps calculated from an MTZ file and a PDB file. The key attributes of these tools are that they automatically search all allowed origin shifts that might relate the two maps and that they write out a modified version of one of the MTZ files or of the PDB file, shifted to match the other. 6. Structure refinement Phenix.refine is the state-of-the-art crystallographic structure-refinement engine of PHENIX. The foundational refinement machinery is a combination of highly efficient programming tools and new or rethought crystallographic algorithms. Phenix.refine possesses an extensive set of tools that cover the majority of refinement scenarios at any data resolution from low to ultrahigh. Various reflection-data formats (for example, CNS, MTZ and SHELX) are recognized automatically. The input experimental data are checked for outliers (Read, 1999 ▶; Zwart et al., 2005 ▶) and any reflections identified as such are excluded from the refinement calculations. Twinning can also be taken into account by providing a twin-law operator, which can be obtained using phenix.xtriage. Both X-ray and/or neutron diffraction data can be used and an option for joint XN refinement is available (simultaneous refinement against X-­ray and neutron data; Adams, Mustyakimov et al., 2009 ▶). Each refinement run begins with robust mask-based bulk-solvent correction and anisotropic scaling (Afonine et al., 2005 ▶). Tools such as efficient rigid-body refinement (multiple-zones algorithm; Afonine et al., 2009 ▶), simulated-annealing refinement (Brünger et al., 1987 ▶) in Cartesian or torsion-angle space (Grosse-Kunstleve et al., 2009 ▶), automatic NCS detection and its use as restraints in refinement are important at low resolution and in the initial stages of refinement. A broad range of atomic displacement parameterizations are available, including grouped isotropic, constrained anisotropic (TLS) and individual atomic isotropic or anisotropic, allowing efficient modelling of atomic displacement parameters at any resolution. Occupancy refinement (grouped, individual, group constrained for alternative conformations or any mixture) can be performed for any user-defined atoms. Atoms in alternative conformations are recognized automatically based on altLoc identifiers in the input PDB file and their occupancies are refined by default. Ordered solvent (water) model updating is integrated into the refinement process. The availability of ultrahigh-resolution data makes it possible to visualize the residual density arising from bonding effects; phenix.refine employs a novel interatomic scatterers model (Afonine et al., 2007 ▶) to adequately account for these features. A flexible parameterization of H atoms allows their use at any resolution from subatomic (where their parameters can be refined individually) to low resolution (where a riding model is used). Refinement can be performed using a variety of refinement target functions, including maximum likelihood, maximum likelihood with experimental phase information and amplitude least squares. The refinement of coordinates can be performed in real or reciprocal space (allowing dual-space refinement). Novel ligands can easily be included in refinement by providing a corresponding CIF file as input (the CIF file can be automatically created using phenix.ready_set). Manual fixing of amino-acid side-chain rotamers can be time-consuming, especially for large structures. Although the use of simulated-annealing refinement increases the convergence radius, it can still fail to fit incorrectly modelled side chains into the correct density. Phenix.refine has an option for automatic selection of the best rotamer based on a rotamer library (Lovell et al., 2000 ▶) and optimal fit into the density (details to be published elsewhere). Furthermore, coupling real-space refinement with the built-in rotamer library and available MolProbity tools allows the automated identification and robust correction of common systematic errors involving backward-fit conformations for Leu, Thr, Val, Ile and Arg side chains, as developed and tested in the Autofix method (Headd et al., 2009 ▶). Phenix.refine allows multi-step complex refinement protocols in which most of the available refinement strategies can be combined with each other and applied to any selected part of the model. For example, a run of phenix.refine may perform rigid-body refinement, simulated annealing, individual and grouped B factors combined with TLS refinement, constrained occupancy refinement and automatic water picking. The output of phenix.refine includes various maps (maximum-likelihood weighted, kicked, incompleteness corrected, anomalous difference and those with any user-defined coefficients), complete model and data statistics and PDB file with a formatted REMARK 3 header ready for PDB deposition. The phenix.refine GUI is integrated with Coot and PyMOL, allowing seamless visual analysis of the refined model and associated maps. Phenix.refine is tightly integrated with other PHENIX components, making structure solution, building and refinement a one-step process (for example, in the AutoMR and AutoBuild wizards). It is routinely tested by automatic re-refinement of all models in the PDB for which the experimental data are available. 6.1. Ligand-coordinate and restraint-geometry generation The electronic Ligand Builder and Optimization Builder (eLBOW; Moriarty et al., 2009 ▶) is a suite of tools designed for the reliable generation of Cartesian coordinates and geometry restraints for both novel and known ligands. In line with the rest of the PHENIX package, the eLBOW modules are written in Python, with the numerically intensive portions of the code written in C++. eLBOW is a flexible platform for converting a majority of common chemical inputs to optimized three-dimensional coordinates and geometry restraints for refinement. Ligand geometries can be minimized using the semi-empirical AM1 quantum-chemical method (Stewart, 2004 ▶), a numerically efficient and chemically accurate technique for the class of molecules commonly complexed with or bound to proteins. In addition, a graphical user interface for editing geometry restraints and simple geometry manipulation of ligands has been developed. The Restraints Editor, Especially Ligands (REEL) removes the tedium of manually editing a restraints file by providing a number of commonly performed actions via pull-down menus and other interactive features. The effect of changes in the restraints can be immediately reflected in the molecule view to provide user feedback. A tool that uses many of the features of eLBOW to quickly and easier prepare a protein model for refinement is known as ReadySet! The flexibility of the Python interface is exemplified by the use of Reduce, eLBOW and several smaller portions of the cctbx toolkit to add H and/or D atoms to the model, ligands and water and to generate metal-coordination files and geometry restraints for unknown ligands. The files required for covalently bound ligands are also generated. 7. Integrated structure determination 7.1. Why automation? Automation has dramatically changed macromolecular crystallography over the past decade, both by greatly speeding up the process of structure solution, model building and refinement and by bringing the tools for structure determination to a much wider group of scientists. As automation becomes increasingly comprehensive, it will allow users to test many more possibilities for structure determination, will allow improved estimation of uncertainties in the final structures and will allow the determination of ever more complex and difficult structures. The PHENIX environment has been developed with automation as a key and defining feature. Each tool within PHENIX can seamlessly and nearly effortlessly be incorporated as part of any other tool or process in PHENIX. This means that very complex tasks can be built up from well tested and characterized tools and that tools and higher-level methods can be re-used in many different contexts. With a full automatic regression testing system as an integral part of the PHENIX environment, all these tasks and high-level methods are tested daily to ensure the integrity of the entire PHENIX system. 7.2. Automated structure solution PHENIX has fully integrated structure-solution capability for both experimental phasing (MAD, SAD, MIR and com­binations of these), carried out by phenix.autosol, and for molecular replacement, performed by phenix.automr. Each of these automated procedures feeds directly into the iterative model building, density modification and refinement of phenix.autobuild. Phenix.autosol is designed to allow complete automation of experimental phasing while allowing a high degree of flexibility for advanced users. Beginning with structure-factor amplitudes and the sequence of the macromolecule, phenix.autosol uses phenix.solve (Terwilliger & Berendzen, 1999 ▶) to scale all data sets, phenix.xtriage (Zwart et al., 2005 ▶) to analyze the data for twinning and to correct any anisotropy in the data and phenix.hyss (Grosse-Kunstleve & Adams, 2003 ▶) to find potential heavy-atom or anomalously scattering atoms. Phenix.autosol carries out experimental phasing with phenix.phaser (McCoy et al., 2004 ▶, 2007 ▶) or phenix.solve (Terwilliger & Berendzen, 1999 ▶), density modification with phenix.resolve (Terwilliger, 1999 ▶) and preliminary model building using the methods in phenix.autobuild (Terwilliger, Grosse-Kunstleve, Afonine, Moriarty, Zwart et al., 2008 ▶). A key step in automated structure solution is the identification of which of several possible space-group and heavy-atom or anomalously scattering-atom substructures is correct. Phenix.autosol uses a Bayesian scoring algorithm based on analysis of the experimental electron-density maps to identify which substructures lead to the best maps (Terwilliger et al., 2009 ▶). The main features of the maps that are used in this evaluation are the skewness of the electron density (non-Gaussian histogram of density with more density in the positive tail than the negative tail) and the correlation of local r.m.s. density (large contiguous regions of high variation where the molecule is located and separate large contiguous regions of low variation where the solvent is located). Phenix.autosol is highly flexible, allowing any combination of experimental data, such as MAD + SIRAS or several SAD data sets. Although it is fully automated, the user can control nearly all aspects of the operation of the procedure, including the scoring criteria and decisions about how certain phenix.autosol should be that the correct solution is contained in the current lists of solutions. Phenix.autosol can carry out phasing using a combination of experimental SAD data and molecular-replacement information. If a molecular-replacement model is available, phenix.autosol will use phenix.phaser (McCoy et al., 2004 ▶, 2007 ▶) to complete the anomalous substructure iteratively by con­structing log-likelihood gradient maps for the anomalous scatterers based on the model of the non-anomalous structure and any anomalous scatterers that have already been found. The anomalous substructure is then used along with the model to calculate phases with phenix.phaser. Phenix.automr carries out automated likelihood-based molecular replacement using phenix.phaser (Read, 2001 ▶; McCoy et al., 2005 ▶, 2007 ▶; McCoy, 2007 ▶). The procedure is highly automated, allowing several copies of each of several components to be placed in a single run, which can also test different possible choices of space group. If there are alternative choices of model for a component, the molecular-replacement calculation can try each of them in turn or combine them as a statistically weighted ensemble. Although the evaluation of the likelihood targets is slow (Read, 2001 ▶), the use of fast approximations for the rotation search (Storoni et al., 2004 ▶) and the translation search (McCoy et al., 2005 ▶) gives run times that are competitive with traditional Patterson-based methods. Likelihood has been demonstrated to be more sensitive to the correct solution, particularly in difficult cases (Read, 2001 ▶). When there are several copies or several components to place, the ability of the likelihood functions to take advantage of preliminary partial solutions can provide a crucial increase in the signal. 7.3. Iterative model building, density modification and refinement Phenix.autobuild is a highly integrated and automated procedure for model building and model improvement through iterative model building, density modification and refinement. Phenix.autobuild uses phenix.resolve (Terwilliger, 2003a ▶,b ▶) to carry out model building, model extension, model assembly, loop fitting and building outside existing models. It further uses phenix.resolve to improve electron-density maps with statistical density modification, including information from the newly built models as well as that obtained from experiment (e.g. phenix.autosol), from NCS (Terwilliger, 2002b ▶) and from other expected features of electron-density maps such as a flat solvent (Wang, 1985 ▶), the presence of secondary-structural features (Terwilliger, 2001 ▶) and the presence of local patterns of density characteristic of macromolecules (Terwilliger, 2003c ▶). To reduce model bias in the procedure, prime-and-switch phasing can also be used (Terwilliger, 2004 ▶). Phenix.autobuild uses phenix.refine (Afonine et al., 2005 ▶) throughout this process to improve the quality of the models that are built. Phenix.autobuild provides two complementary approaches to model building. For cases in which no model or only a preliminary model has been built, phenix.autobuild will con­struct a new model considering the main chain of any supplied models as potential coordinates. In cases where a nearly final model is available, phenix.autobuild can apply a rebuild-in-place approach in which the polypeptide chain is rebuilt a few residues at a time without changing the register or the overall features of the model. The rebuild-in-place approach in phenix.autobuild provides a powerful method for the assessment of uncertainties in an atomic model by repetitive rebuilding of the model using different random seeds for each iteration (Terwilliger, Grosse-Kunstleve et al., 2007 ▶). The variability in the coordinates of each atom in the ensemble that is created is a lower bound on the uncertainty of the position of that atom. 8. Conclusions Advances in computational methods and algorithms have made it possible to automate the solution of many structures with PHENIX. However, many challenges still exist. In particular, the development of automated methods that can be applied at low resolution (worse than 3.0 Å) remains a priority. In this resolution range there are typically too few experimental data to uniquely define the macromolecular structure for automated ab initio model building. Thus, methods are required that rely on prior knowledge from existing macromolecular structures to permit productive automated data interpretation. These methods will need to be developed and applied for all stages of structure solution and tightly integrated to maximize the information extracted from the experimental data.
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access


              1. Functional specification The program package XDS (Kabsch, 1988a ▶,b ▶, 1993 ▶, 2010 ▶) was developed for the reduction of single-crystal diffraction data recorded on a planar detector by the rotation method using monochromatic X-rays. It includes a set of three programs. XDS accepts a sequence of adjacent non-overlapping rotation images from a variety of imaging-plate, CCD, pixel and multiwire area detectors, infers crystal symmetry and metrics and produces a list of corrected integrated intensities of the reflections occurring in the images in a nearly automatic way. The program assumes that each image covers the same positive amount of crystal rotation and that the rotation axis, incident beam and crystal intersect at one point, but otherwise imposes no limitations on the detector position, on the directions of the rotation axis and incident beam or on the oscillation range covered by each image. XSCALE places the data sets obtained from processing with XDS on a common scale, optionally merges them into one or several sets of unique reflections and reports their completeness and the quality of the integrated intensities. It corrects the data for absorption effects, sensitivity variations in the detector plane and radiation damage. Optionally, it can correct reflections individually for radiation damage by extrapolation to their initial intensities at zero dose. XDSCONV converts reflection data files as obtained from XDS or XSCALE into various formats required by software packages for crystal structure determination. It can generate test reflections or inherit previously selected ones which are used for the calculation of a free R factor to monitor the progress of structure refinement. 2. XDS XDS is organized into eight steps (major subroutines) which are called in succession by the main program. Information is exchanged between the steps by files (see Table 1 ▶), which allows the repetition of selected steps with a different set of input parameters without rerunning the whole program. The files generated by XDS are either ASCII-type files that can be inspected and modified using a text editor or binary control images saved as a byte-offset variant of the CBFlib format (Bernstein & Hammersley, 2005 ▶; Bernstein & Ellis, 2005 ▶). Such images are indicated by the file-name extension .cbf and can be looked at using the open-source program XDS-Viewer written by Michael Hoffer. All files have a fixed name defined by XDS, which makes it mandatory to process each data set in a newly created directory in order to avoid name clashes. Clearly, one should not run more than one XDS job at a time in the same directory. Output files affected by rerunning selected steps (see Table 1 ▶) should also first be given another name if their original contents are meant to be saved. Data processing begins by copying an appropriate input file into the new directory. Input-file templates are provided with the XDS package for a number of frequently used data-collection facilities. The copied input file must be renamed XDS.INP and edited to provide the correct parameter values for the actual data-collection experiment. All parameters in XDS.INP are named by keywords containing an equals sign as the last character and many of them will be mentioned here in context in order to clarify their meaning. Execution of XDS (JOB=XDS) invokes each of the eight program steps as described below. The results and diagnostics from each step are saved in files with the extension .LP attached to the program-step name. These files should always be studied carefully to see whether processing was satisfactory or, in the case of failure, to find out what could have gone wrong. 2.1. XYCORR This program step calculates a lookup table of additive spatial corrections at each detector pixel which is stored in the files X-CORRECTIONS.cbf, Y-CORRECTIONS.cbf. Often, the data images have already been corrected for geometrical distortions, in which case XYCORR produces tables of zeros. For spiral read-out imaging-plate detectors the small corrections resulting from radial (ROFF=) and tangential (TOFF=) offset errors of the scanner are computed. For some multiwire and CCD detectors that deliver geo­metrically distorted images, corrections are derived from a calibration image (BRASS_PLATE_IMAGE=file name). This image displays the response to a brass plate containing a regular grid of holes which is mounted in front of the detector and illuminated by an X-ray point source. Clearly, the source must be placed exactly at the location to be occupied by the crystal during the actual data collection, as photons emanating from the calibration source are meant to simulate all possible diffracted beam directions. For visual control, spots that have been located and accepted from the brass-plate image by XYCORR are marked in the file FRAME.cbf. The following problems can be encountered in this step. (i) A misplaced calibration source can lead to an incorrect lookup table, impairing the correct prediction of the observed diffraction pattern in subsequent program steps. (ii) An underexposed calibration image can result in an incomplete and unreliable list of calibration spots. 2.2. INIT INIT determines three lookup tables, saved as the files BLANK.cbf, GAIN.cbf and BKGINIT.cbf, that are required by the subsequent processing steps for classifying pixels in the data images as background or belonging to a diffraction spot (‘strong’ pixels). These tables should be inspected with the XDS-Viewer program. BLANK.cbf contains a lookup table of the detector noise. It is determined from a specific image recorded in the absence of X-rays (DARK_CURRENT_IMAGE=) or is assumed to be a constant derived from the mean recorded value in each corner of the data images. GAIN.cbf codes for the expected variation of the pixel contents in the background region of a data image. The variance of the contents of a pixel in the background region is GAIN·(pixel contents − detector noise). The variance is determined from the scatter of pixel values within a rectangular box (NBX=, NBY=) of size (2·NBX + 1)·(2·NBY + 1) centred at each image pixel in succession. The table GAIN.cbf is used to distinguish background pixels from ‘strong’ pixels that are part of a diffraction spot. BKGINIT.cbf estimates the initial background at each pixel from a few data images specified by the input parameter BACKGROUND_RANGE=. The lookup table is obtained by adding the X-ray background from each image. Shaded regions on the detector (i.e. from the beamstop), pixels outside a user-defined circular region (TRUSTED_REGION=) or pixels with an undefined spatial correction value are classified as untrustworthy and marked by −3. The following problem can be encountered in this step. Some detectors with insufficient protection from electromagnetic pulses may generate badly spoiled images whose inclusion leads to a completely wrong X-ray background table. These images can be identified in INIT.LP by their un­expected high mean pixel contents and this step should be repeated with a different set of images. 2.3. COLSPOT COLSPOT locates strong diffraction spots occurring in a subset of the data images and saves their centroids in the file SPOT.XDS. The data subset is defined by contiguous image number ranges, where each range is specified by the keyword SPOT_RANGE=. As described in Kabsch (2010 ▶), spots are defined as sets of ‘strong’ pixels that are adjacent in three dimensions. The classification of ‘strong’ pixels is controlled by the decision constants STRONG_PIXEL= and BACKGROUND_PIXEL=. If the total number of ‘strong’ pixels occurring in the specified data images exceeds the upper limit as given by the input parameter MAXIMUM_NUMBER_OF_STRONG_PIXELS=, the weaker ones are discarded. A spot is accepted if it contains a minimum number of ‘strong’ pixels (MINIMUM_NUMBER_OF_PIXELS_IN_A_SPOT=) and if the spot centroid is sufficiently close to the location of the strongest pixel in the spot (SPOT_MAXIMUM-CENTROID=). The following problem can be encountered in this step. Sharp edges such as ice rings in the images can lead to an excessive number of ‘strong’ pixels being erroneously classified as contributing to diffraction spots. These aliens could prevent IDXREF from recognizing the crystal lattice. 2.4. IDXREF IDXREF uses the initial parameters describing the diffraction experiment as provided by XDS.INP and the observed centroids of the spots from the file SPOT.XDS to find the orientation, metric and symmetry of the crystal lattice and refines all or a specified subset of these parameters [input parameter REFINE(IDXREF)=] . On return, the complete set of parameters are saved in the file XPARM.XDS and the original file SPOT.XDS is replaced by a file of identical name, now with indices attached to each observed spot. Spots not belonging to the crystal lattice are given indices 0, 0, 0. XDS considers the run to be successful if the coordinates of at least 70% of the given spots can be explained with reasonable accuracy (input parameter MAXIMUM_ERROR_OF_SPOT_POSITION=); otherwise, XDS will stop with an error message. Alien spots often arise because of the presence of ice or small satellite crystals and continuation of data processing may still be meaningful. In this case, XDS is called again with an explicit list of the subsequent steps specified in XDS.INP (input parameter JOB=DEFPIX XPLAN INTEGRATE CORRECT). IDXREF uses the methods described in Kabsch (1993 ▶, 2010 ▶) to determine a crystal lattice that explains the observed locations of the diffraction spots listed in the file SPOT.XDS. Firstly, a reciprocal-lattice vector referring to the unrotated crystal is computed from each observed spot centroid. Differences between any two reciprocal-lattice vectors that are above a specified minimal length (SEPMIN=) are accumulated in a three-dimensional histogram. These difference vectors will form clusters in the histogram, since there are many different pairs of reciprocal-lattice vectors of nearly identical vector difference. The clusters are found as maxima in the smoothed histogram (CLUSTER_RADIUS=) and a basis of three linearly independent cluster vectors is selected that allows all other cluster vectors to be expressed as nearly integral multiples of small magnitude with respect to this basis. The basis vectors and the 60 most populated clusters with attached indices are listed in IDXREF.LP. If many of the indices deviate significantly from integral values, the program is unable to find a reasonable lattice basis and all further processing will be meaningless. If the space group and unit-cell parameters are specified, a reduced cell is derived and the reciprocal-basis vectors found above are reinterpreted accordingly; otherwise, a reduced cell is determined directly from the reciprocal basis. The parameters of the reduced cell, the coordinates of the reciprocal-basis vectors and their indices with respect to the reduced cell are reported. Based on the orientation and metric of the reduced cell now available, IDXREF indexes up to 3000 of the strongest spots using the local-indexing method. This method considers each spot as a node of a tree and identifies the largest subtree of nodes which can be assigned reliable indices. The number of reflections in the ten largest subtrees is reported and usually shows a dominant first tree corresponding to a single lattice, whereas alien spots are found in small subtrees. Reflections in the largest subtree are used for initial refinement of the basis vectors of the reduced cell, the incident-beam wavevector and the origin of the detector, which is the point in the detector plane nearest to the crystal. Experience has shown that the detector origin and the direction of the incident beam are often specified with insufficient accuracy, which could easily lead to a misindexing of the reflections by a constant offset. For this reason, IDXREF considers alternative choices for the index origin and reports their likelihood of being correct. The parameters controlling the local indexing are INDEX_ERROR=, INDEX_MAGNITUDE=, INDEX_QUALITY= (corresponding to ∊, ϕ and 1 − ℓmin in Kabsch, 2010 ▶) and INDEX_ORIGIN=h 0, k 0, l 0, which is added to the indices of all reflections in the tree. After initial refinement based on the reflections in the largest subtree, all spots which can now be indexed are included. Usually, the detector distance and the direction of the rotation axis are not refined, but if the spots were extracted from images covering a large range of total crystal rotation then better results are obtained by including these parameters in the refinement [REFINE(IDXREF)=] . The refined metric parameters of the reduced cell are used to test each of the 44 possible lattice types as described in Kabsch (2010 ▶). For each lattice type, IDXREF reports the likelihood of its being correct and the conventional unit-cell parameters. The program step concludes with an overview of possible lattice symmetries, but makes no automatic decision for the space group. If the crystal symmetry is unknown, XDS will continue data processing with the crystal being described by its reduced-cell basis vectors and triclinic symmetry. Space-group assignment is postponed to the last program step, CORRECT, when integrated intensities are available. The following problems can be encountered in this step. (i) The indices of many difference-vector clusters deviate significantly from integral values. This can be caused by incorrect input parameters, such as rotation axis, oscillation angle or detector position, by a large fraction of alien spots in SPOT.XDS, by placing the detector too close to the crystal or by an inappropriate choice of the parameters SEPMIN= and CLUSTER_RADIUS= in densely populated images. (ii) Indexing and refinement is unsatisfactory despite well indexed difference-vector clusters. This is probably caused by the selection of an incorrect index origin and IDXREF should be rerun with plausible alternatives for INDEX_ORIGIN= after a visual check of a data image with XDS-Viewer. (iii) Despite successful indexing and refinement, IDXREF stops with the error message INSUFFICIENT PERCENTAGE OF INDEXED REFLECTIONS, complaining that less than 70% of the given spots could be explained. Alien spots often arise because of the presence of ice or small satellite crystals and continuation of data processing may still be meaningful. To continue data processing, just specify the missing processing steps in XDS.INP by JOB=DEFPIX XPLAN INTEGRATE CORRECT and call XDS again. 2.5. DEFPIX DEFPIX recognizes regions in the initial background table (file BKGINIT.cbf) that are obscured by intruding hardware and marks the shaded pixels as untrusted. In addition, pixels that are outside a user-defined resolution range (INCLUDE_RESOLUTION_RANGE=) are marked and eliminated from the trusted region. The marked background table that is thus obtained is saved in the file BKGPIX.cbf which is needed by the subsequent program steps. To recognize the obscured regions in the initial background, DEFPIX generates a control image (file ABS.cbf) that contains values of around 10 000 for unshaded pixels and lower values for shaded pixels. The classification of the pixels into reliable and untrusted pixels is based on the two input parameters VALUE_RANGE_FOR_TRUSTED_DETECTOR_PIXELS= (default 6000 30 000) and INCLUDE_RESOLUTION_RANGE= (default 20.0 0.0). Pixels in the table ABS.cbf with a value outside the ranges specified by the two parameters are marked unreliable (by −3) in the background table BKGPIX.cbf. The following problem can be encountered in this step. If the parameter VALUE_RANGE_FOR_TRUSTED_DETECTOR_PIXELS= specifies a value range that is too narrow, ‘good’ regions will erroneously be excluded from the trusted detector region. Check BKGPIX.cbf with the XDS-Viewer program and if necessary repeat the DEFPIX step with more appropriate values. 2.6. XPLAN XPLAN supports the planning of data collection. It is based upon information provided by the input files XPARM.XDS and BKGPIX.cbf, both of which become available on processing a few test images with XDS. XPLAN estimates the completeness of new reflection data expected to be collected for each given starting angle and total crystal rotation and reports the results for a number of selected resolution shells in the file XPLAN.LP. To minimize the recollection of data, the name of a file containing already measured reflections can be provided by the input parameter REFERENCE_DATA_SET=. The following problems can be encountered in this step. (i) Incorrect results may occur for some space groups, i.e. P42, if the unit cell determined by XDS from processing a few test images implicates reflection indices that are inconsistent with those from the reference data set. However, the correct cell choice can be found by using the old data as a reference and repeating CORRECT with the appropriate reindexing transformation, followed by copying GXPARM.XDS to XPARM.XDS. The same applies if IDXREF was run for an unknown space group and then reindexed in CORRECT. (ii) XPLAN ignores potential reflection overlap owing to the finite oscillation range covered by each image. 2.7. INTEGRATE INTEGRATE determines the intensity of each reflection predicted to occur in the rotation data images (DATA_RANGE=) and saves the results in the file INTEGRATE.HKL. The diffraction parameters needed to predict the reflection positions are initially provided by the file XPARM.XDS. These parameters are either kept constant or refined periodically using strong diffraction spots encountered in the data images. Whether refinement should be carried out at all and which parameters are to be refined can be specified by the user [input parameter REFINE(INTEGRATE)=]. The centroids of the strong spots in the data images are computed from pixels that exceed the background by a given multiple of standard deviations (input parameters SIGNAL_PIXEL=, BACKGROUND_PIXEL=). Strong spots are used in the refinement if their centroids are reasonably close to their calculated position (input parameter MAXIMUM_ERROR_OF_SPOT_POSITION=). For determination of the intensity, approximate values describing the extension and the form of the diffraction spot must be specified. The shapes of all spots become very similar when the contents of each of their contributing image pixels is mapped onto a three-dimensional coordinate system, specific for each reflection, which has its origin on the surface of the Ewald sphere at the terminus of the diffracted beam wavevector (see Kabsch, 2010 ▶). The transformed spot can roughly be described as a Gaussian involving two parameters: the standard deviations of the reflecting range σM (input parameter REFLECTING_RANGE_E.S.D.=σM) and the beam divergence σD (input parameter BEAM_DIVERGENCE_E.S.D.=σD). This leads to an integration region around the spot that is defined by the parameters δM (REFLECTING_RANGE=) and δD (BEAM_DIVERGENCE=), which are typically chosen to be 6–10 times larger than σM and σD, respectively. Appropriate values for these parameters are determined automatically by XDS (Kabsch, 2010 ▶); the user has the option to override the automatic assignments. Integration is carried out by a two-step procedure. In the first pass, spot templates are generated by superimposing the profiles of strong reflections after their mapping to the Ewald sphere. Grid points with a value above a minimum percentage of the maximum in the template (parameter CUT=) are marked for inclusion in the final integration. To allow for variations in their shape, profile templates are generated from reflections located at nine regions of equal size covering the detector surface and additional sets of nine to cover equally sized (parameter DELPHI=) batches of images. The actual integration is carried out in the second pass by profile fitting with respect to the spot shape determined in the first pass. Incomplete reflections below a minimum percentage of the observed reflection intensity (parameter MINPK=) will be discarded. Otherwise, the missing intensity is estimated from the learned reflection profiles. On return from the INTEGRATE step, all spots expected to occur in the last data image are encircled and the modified image is saved as the file FRAME.cbf for inspection. The following problems can be encountered in this step. (i) Off-centred profiles indicate incorrectly predicted reflection positions by using the parameters provided by the file XPARM.XDS (i.e. misindexing by using a wrong origin of the indices), crystal slippage or change in the incident-beam direction. (ii) Profiles extending to the borders of the box indicate too-small values of the parameters BEAM_DIVERGENCE= or REFLECTING_RANGE=. This leads to incorrect integrated intensities because of truncated reflection profiles and un­reliable background determination. (iii) Display of the file FRAME.cbf shows spots which are not encircled. If these unexpected reflections are not close to the spindle and are not ice reflections, then it is likely that the parameters provided by the file XPARM.XDS are wrong. 2.8. CORRECT CORRECT applies correction factors to the intensities and standard deviations of all reflections found in the file INTEGRATE.HKL, determines the space group if unknown and refines the unit-cell parameters, reports the quality and com­pleteness of the data set and saves the final integrated intensities in the file XDS_ASCII.HKL. Some of the employed algorithms are new and are described in Kabsch (2010 ▶). CORRECT accepts reflections from the file INTEGRATE.HKL that are (i) recorded (parameter MINPK=) on specified images (parameter DATA_RANGE=); (ii) within a given resolution range (parameter INCLUDE_RESOLUTION_RANGE=); (iii) outside ice rings (parameter EXCLUDE_RESOLUTION_RANGE=); (iv) not overloaded (parameter OVERLOAD=); and (v) not marked for exclusion in the file REMOVE.HKL. Thus, the user has the option to exclude unreliable reflections from the final data set by repeating the CORRECT step with appropriate parameter values. The intensities of the accepted reflections are first corrected for effects arising from polarization of the incident beam (parameters FRACTION_OF_POLARIZATION=, POLARIZATION_PLANE_NORMAL=) and absorption effects (parameters AIR=, SILICON=, SENSOR_THICKNESS=) arising from differences in path lengths of the diffracted beam. These corrections do not depend on knowledge of the space group. The integrated intensities of the reflections in the file INTEGRATE.HKL may or may not have been indexed in the correct space group; for the purpose of integration, it is important only that all reflections occurring in the data images have been indexed with respect to some unit-cell basis and that their locations on the images were hit exactly. The correct reflection indices in the true space group are always a linear transformation of the original indices used in INTEGRATE.HKL. All lattices consistent with the locations of the reflections saved in INTEGRATE.HKL (decision parameters MAX_CELL_AXIS_ERROR=, MAX_CELL_ANGLE_ERROR=) and their corresponding linear transformations are printed to provide a useful overview similar to that shown in IDXREF.LP. If the space group is not specified, XDS proposes one of the enantiomorphous space groups without screw axes that is compatible with the observed lattice symmetry and explains the intensities of a subset of the reflections (parameter TEST_RESOLUTION_RANGE=) at an acceptable R meas (Diederichs & Karplus, 1997 ▶; Weiss, 2001 ▶) using a minimum number of unique reflections. The criteria for an acceptable R meas are controlled by the decision parameters MIN_RFL_Rmeas= and MAX_FAC_Rmeas=. The user can always override the automatic decisions by specifying the correct space-group number (parameter SPACE_GROUP_NUMBER=) and unit-cell parameters (parameter UNIT_CELL_CONSTANTS=) in XDS.INP and repeating the CORRECT step. This provides a simple way to rename orthorhombic unit-cell parameters, which often becomes necessary if screw axes are present. In addition, the user has the option to specify the following in XDS.INP: (i) a reference data set (parameter REFERENCE_DATA_SET=), (ii) a reindexing transformation (parameter REIDX=) and (iii) three basis vectors if known from processing a previous data set taken at the same crystal orientation in a multi-wavelength experiment (parameters UNIT_CELL_A-AXIS=, UNIT_CELL_B-AXIS=, UNIT_CELL_C-AXIS=). The possibility of comparing the new data with a reference data set is particularly useful for resolving the issue of alternative settings of polar or rhombohedral cells (such as P4, P6 and R3). Also, reference data are quite useful for recognizing misindexing or for testing potential heavy-atom derivatives. For refinement of the unit-cell parameters [parameter REFINE(CORRECT)=], CORRECT uses a subset of the accepted reflections whose observed centroid is sufficiently close to the predicted spot position (parameter MAXIMUM_ERROR_OF_SPOT_POSITION=). The refined set of parameters is saved in the file GXPARM.XDS, which has an identical layout to the file XPARM.XDS produced by IDXREF. If the crystal has not slipped during data collection, these parameters are quite accurate. Other correction factors (parameter CORRECTIONS=) which partially compensate for radiation damage, absorption effects and variations in the sensitivity of the detector surface are determined from the symmetry-equivalent reflections usually found in the data images. The corrections are chosen such that the integrated intensities of symmetry-equivalent reflections come out as similar as possible. The user may control application of the various corrections by specifying the parameter CORRECTIONS= by a combination of the key­words DECAY MODULATION ABSORPTION. Whether Friedel pairs are considered as symmetry-equivalent reflections in the calculation of the correction factors depends on the values of the two parameters STRICT_ABSORPTION_CORRECTION= and FRIEDEL’S_LAW=. The number of correction factors is controlled by the input parameters MINIMUM_I/SIGMA=, NBATCH= and REFLECTIONS/CORRECTION_FACTOR=. The residual scatter in intensity of symmetry-equivalent reflections is used to estimate their standard deviations. Here, the initial estimate v 0(I) (obtained from the INTEGRATE step) for the variance of the reflection intensity I is replaced by v(I) = a[v 0(I) + bI 2]. The two constants a and b are chosen to minimize discrepancies between v(I) and the variance estimated from sample statistics of symmetry-related reflections. Based on the more realistic error estimates for the intensities, outliers are recognized by comparison with other symmetry-equivalent reflections. These outliers are included in the main output file XDS_ASCII.HKL, in which they are marked by a negative sign attached to the estimated standard deviations of their intensity. Classification of a reflection as a misfit is con­trolled by a decision constant which has the default value of WFAC1=1.5. Specification of a lower value such as WFAC1=1.0 by the user will lead to an increasing number of misfits and lower R factors as outliers are not included in the reported statistics. Data quality as a function of resolution is described by the agreement of intensities of symmetry-related reflections and quantified by the R factors R merge and the more robust indicator R meas (Diederichs & Karplus, 1997 ▶; Weiss, 2001 ▶). These R factors as well as the intensities of all reflections with indices of type h00, 0k0 and 00l and those expected to be systematically absent provide important information for identification of the correct space group. Clearly, large R factors or many rejected reflections or large observed intensities for reflections that are expected to be systematically absent suggest that the assumed space group or indexing is incorrect. The presence or absence of anomalous scatterers is specified by the parameter FRIEDEL’S_LAW=. Finally, CORRECT analyzes the distribution of reflection intensities as a function of their resolution and reports outliers from the Wilson plot. Often, these aliens arise from ice rings in the data images. To suppress the un­wanted reflections from the final output file XDS_ASCII.HKL, the user copies them to a file named REMOVE.HKL in the current directory and repeats the CORRECT step. The following problems can be encountered in this step. (i) Incomplete data sets may lead to wrong conclusions about the space group, as some of its symmetry operators might not be involved in the R-factor calculations. (ii) Often, the CORRECT step is repeated several times. It should be remembered that XDS overwrites earlier versions of the output files XDS_ASCII.HKL, GXPARM.XDS etc. 3. XSCALE The scaling program XSCALE (i) puts one or more files obtained from data processing with XDS on a common scale and reports the completeness and quality of the data sets; (ii) offers a choice of either combining symmetry-equivalent observations into a single unique reflection or saving the scaled but unmerged observations in the output file; (iii) allows several output files that are placed on the same scale, a feature that is recommended for MAD data sets taken from the same crystal at different wavelengths; (iv) determines correction factors that partially compensate for absorption effects, sensitivity variations in the detector plane and radiation damage; and (v) can correct reflections individually for radiation damage (Diederichs et al., 2003 ▶). The program uses a new fast algorithm (Kabsch, 2010 ▶) and imposes no limitations on the number of data sets or scaling/correction factors. The easiest way to run XSCALE is to copy a template input file named XSCALE.INP to a new directory and to replace the parameter values by the appropriate values describing the actual scaling run. The input parameters may be given in arbitrary order, except for the parameters defining the input and output reflection files (INPUT_FILE=, OUTPUT_FILE=). Here, an output file is defined first by the parameter OUTPUT_FILE= that will include the scaled and merged reflections from all following input files specified by the parameters INPUT_FILE= until the next occurrence of OUTPUT_FILE= in XSCALE.INP. An arbitrary number of output files can be specified (together with their set of input files) in a single run of XSCALE. All output files are then on the same scale, which is a useful program feature for MAD data sets. The reflections in each output file will be unmerged and Friedel pairs will be considered to be different if this holds for all of the input data sets unless explicitly redefined by the parameters MERGE= and FRIEDEL’S_LAW=. Moreover, each output file accepts an additional parameter that controls how the Friedel pairs of the input files are treated in the calculation of the absorption correction factors. If STRICT_ABSORPTION_CORRECTION=FALSE, Friedel pairs are treated as symmetry-equivalent reflections in these calculations, which could lead to an underestimate of the anomalous differences in the presence of anomalous scatterers. Friedel pairs are only treated as different reflections in the calculations if STRICT_ABSORPTION_CORRECTION=TRUE and FRIEDEL’S_LAW=FALSE. For each input file, a resolution window for accepting reflections (INCLUDE_RESOLUTION_RANGE=), the extent of absorption corrections (CORRECTIONS=DECAY MODULATION ABSORPTION) and the number of correction factors (NBATCH=) can be specified. Finally, each input data set can be corrected for radiation damage by specifying the name of the crystal the data set was obtained from (CRYSTAL_NAME=). Specification of this parameter implicates zero-dose extrapolation of individual reflection intensities to compensate for the effects of radiation damage experienced by the crystal so far (see Diederichs et al., 2003 ▶). Each resulting scaled data set is of XDS_ASCII format. It can be converted into a CCP4-style multi-record MTZ file using the copy feature of the program POINTLESS (Evans, 2006 ▶) available from the web ( or converted by XDSCONV into the format required by various structure-solution packages. 4. XDSCONV XDSCONV accepts reflection-intensity data files as produced by XSCALE or CORRECT and converts them into the format required by software packages for structure determination. XDSCONV estimates structure-factor moduli based on the assumption that the intensity data set obeys Wilson’s distribution and uses a Bayesian approach to statistical inference as described by French & Wilson (1978 ▶). The output file generated may inherit the test reflections previously used to calculate a free R factor (Brünger, 1992 ▶) or may contain new test reflections selected by XDSCONV. 5. Parallelization of XDS In order to efficiently use modern multiprocessor hardware, a major effort has been undertaken to replace the original code of XDS by routines that can run concurrently with very little need for synchronization. As described above, data processing by XDS is organized into eight steps that must be executed in a fixed order since the result of each step is needed as input for the subsequent ones. Thus, the only way to speed up processing is to make each step faster. The most computationally intensive steps are COLSPOT and INTEGRATE and, to a lesser degree, the routine that refines diffraction parameters in IDXREF and CORRECT. Thus, the highest savings in wall clock time are expected to result from changing these routines so that each one can make efficient use of the multiprocessor hardware. Two methods can be used (simultaneously) to speed up data processing. In the first method, XDS divides the set of data images into approximately equal portions, calls a shell script that starts an independent job for processing each portion of images by the computer cluster and waits until all jobs have finished. The number of such independent jobs can be limited by the user (MAXIMUM_NUMBER_OF_JOBS=); up to 99 jobs are allowed. This method works even if the processors do not share the same address space since the jobs are independent processes that do not communicate at all. The second method uses OpenMP to control execution by a team of threads and relies on a shared-memory multiprocessor platform. This allows the program to exploit data parallelism at a more fine-grained level to speed up refinements and routines for setting up and solving systems of linear equations. The maximum number of threads that can be employed by the parallel version of XDS (xds_par, xscale_par) can be limited by the user (MAXIMUM_NUMBER_OF_PROCESSORS=); up to 32 processors can be used. OpenMP has been chosen for execution control because it hardly adds to the complexity of the program code and most importantly does not require the maintenance of separate versions of the source code depending on whether the program is intended for execution by a team of processors or just by a single CPU. Moreover, OpenMP has become the de facto standard and compilers accepting OpenMP directives are available for most shared-memory multiprocessor platforms. The new version of COLSPOT comprises an initial part, a concurrent procedure and a final part. After initialization each available processor is kept busy analyzing its share of rotation images for strong pixels, which are saved in a processor-specific file. In the final sequential part of COLSPOT all files resulting from the concurrent computations are read and the location addresses, image running numbers and signal values of the strong pixels are stored in a hash table. Strong pixels belonging to the same spot can be located rapidly in this table and the centroids of the spots are saved in the final output file from this step. For the INTEGRATE step, the rotation images are divided into approximately equal portions for independent processing under control by a shell script according to the first method described above. When all jobs have finished, the integrated intensities from each independent file are joined. Minor problems could occur for reflections that receive intensity contributions from images that have been processed by different jobs. Compared with processing as a single job, the observed intensity differences are small and disappear if the different jobs use identical reference profiles and diffraction parameters to predict spot locations [to avoid refinements, specify REFINE(INTEGRATE)=!]. In addition, each of the independent jobs can be executed by a team of processors controlled by OpenMP. The rotation images analyzed by each job are split into a sequence of batches of consecutive images that cover a total rotation range that is large enough to accommodate the integration domain. The batches are evaluated in strictly sequental order; parallel processing is confined to images within each batch. The restructured routine for the INTEGRATE step consists of code regions for parallel execution interspersed by sequential sections. After initialization, strong reflections and their mean size and extent are determined concurrently. The diffraction parameters are refined in parallel processing mode based on the observed spot locations. In the following sequential section a database is generated containing information about all reflections occurring in this batch of images. A subset of strong reflections is also identified that is useful for the subsequent reflection-profile learning pass. The mean profile of these reflections is determined concurrently in a second pass through the images in the batch. Reflection integration by profile fitting is carried out in parallel in the third cycle through the batch. In the final sequential step the results from each job, which have been saved in files, are harvested and intensity contributions to the same reflection from adjacent batches are merged. 6. Availability Documentation and executable versions of the XDS package for widely used computer systems running under Linux or OSX can be obtained from the XDS homepage ( free of charge for use by academics for noncommercial applications. Additional information can be found at For looking at rotation data images and control images generated by XDS, an open-source program XDS-Viewer written by Michael Hoffer can be obtained from under the GNU General Public License. A graphical interface XDSi (Kursula, 2004 ▶) is available ( that simplifies the operation of XDS.

                Author and article information

                [1 ]Department of Biochemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland
                Author notes
                []Correspondence to: jinek@ (M.J.)

                Author contributions: C.A. designed experiments, performed site-directed mutagenesis, prepared guide RNAs, purified and crystallized the Cas9-sgRNA-target DNA complex, determined its structure together with M.J., and performed plasmid cleavage assays. O.N. purified Cas9 mutants, performed EMSA assays and assisted with cleavage assays. A.D. performed site-directed mutagenesis, prepared guide RNAs and assisted with cleavage assays. M.J. designed experiments and supervised the study. C.A. and M.J. wrote the manuscript, with input from remaining authors.

                Author information: Atomic coordinates and structure factors have been deposited in the Protein Data Bank under accession numbers 4un3, 4un4, 4un5. M.J. is a co-founder of Caribou Biosciences, Inc. The authors have filed a related patent application.

                14 July 2014
                27 July 2014
                25 September 2014
                25 March 2015
                : 513
                : 7519
                : 569-573



                Comment on this article