Blog
About

7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Crystal Structure of the Full-Length Feline Immunodeficiency Virus Capsid Protein Shows an N-Terminal β-Hairpin in the Absence of N-Terminal Proline

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Feline immunodeficiency virus (FIV) is a member of the Retroviridae family. It is the causative agent of an acquired immunodeficiency syndrome (AIDS) in cats and wild felines. Its capsid protein (CA) drives the assembly of the viral particle, which is a critical step in the viral replication cycle. Here, the first atomic structure of full-length FIV CA to 1.67 Å resolution is determined. The crystallized protein exhibits an original tetrameric assembly, composed of dimers which are stabilized by an intermolecular disulfide bridge induced by the crystallogenesis conditions. The FIV CA displays a standard α-helical CA topology with two domains, separated by a linker shorter than other retroviral CAs. The β-hairpin motif at its amino terminal end, which interacts with nucleotides in HIV-1, is unusually long in FIV CA. Interestingly, this functional β-motif is formed in this construct in the absence of the conserved N-terminal proline. The FIV CA exhibits a cis Arg–Pro bond in the CypA-binding loop, which is absent in known structures of lentiviral CAs. This structure represents the first tri-dimensional structure of a functional, full-length FIV CA.

          Related collections

          Most cited references 53

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Features and development of Coot

          1. Introduction Macromolecular model building using X-ray data is an interactive task involving the iterative application of various optimization algorithms with evaluation of the model and interpretation of the electron density by the scientist. Coot is an interactive three-dimensional molecular-modelling program particularly designed for the building and validation of protein structures by facilitating the steps of the process. In recent years, initial construction of the protein chain has often been carried out using automatic model-building tools such as ARP/wARP (Langer et al., 2008 ▶), SOLVE/RESOLVE (Wang et al., 2004 ▶) and more recently Buccaneer (Cowtan, 2006 ▶). In consequence, relatively more time and emphasis is placed on model validation than has previously been the case (Dauter, 2006 ▶). The refinement and validation steps become increasingly important and also more time-consuming with lower resolution data. Coot aims to provide access to as many of the tools required in the iterative refinement and validation of a macromolecular structure as possible in order to facilitate those aspects of the process which cannot be performed automatically. A primary design goal has been to make the software easy to learn in order to provide a low barrier for scientists who are beginning to work with X-ray data. While this goal has not been met for every feature, it has played a major role in many of the design decisions that have shaped the software. The principal tasks of the software are the visualization of macromolecular structures and data, the building of models into electron density and the validation of existing models; these will be considered in the next three sections. The remaining sections of the paper will deal with more technical aspects of the software, including interactions with external software, scripting and testing. 2. Program design The program is constructed from a range of existing software libraries and a purpose-written Coot library which provides a range of tools specific to model building and visualization. The OpenGL and other graphics libraries, such as the X Window System and GTK+, provide the graphical user-interface functionality, the GNU Scientific Library (GSL) provides mathematical tools such as function minimizers and the Clipper (Cowtan, 2003 ▶) and MMDB (Krissinel et al., 2004 ▶) libraries provide crystallographic tools and data types. On top of these tools are the Coot libraries, which are used to manipulate models and maps and to represent them graphically. Much of this functionality may be accessed from the scripting layer (see §8), which allows programmatic access to all of the underlying functionality. Finally, the graphical user interface is built on top of the scripting layer, although in some cases it is more convenient for the graphical user interface to access the underlying classes directly (Fig. 1 ▶). 3. Visualization Coot provides tools for the display of three-dimensional data falling into three classes. (i) Atomic models (generally displayed as vectors connecting bonded atoms). (ii) Electron-density maps (generally contoured using a wire-frame lattice). (iii) Generic graphical objects (including the unit-cell box, noncrystallographic rotation axes and similar). A user interface and a set of controls allow the user to interact with the graphical display, for example in moving or rotating the viewpoint, selecting the data to be displayed and the mode in which those data are presented. The primary objective in the user interface as it stands today has been to make the application easy to learn. Current design of user interfaces emphasizes a number of characteristics for a high-quality graphical user interface (GUI). Such characteristics include learnability, productivity, forgiveness (if a user makes a mistake, it should be easy to recover) and aesthetics (the application should look nice and provide a pleasurable experience). When designing the user interface for Coot, we aim to respect these issues; however, this may not always be achieved and the GUI often undergoes redesign. Ideally, a user who has a basic familiarity with crystallographic data but who has never used Coot before should be able to start the software, display their data and perform some basic manipulations without any instruction. In order for the software to be easy to learn, it is necessary that the core functionality of the software be discoverable, i.e. the user should be able to find out how to perform common tasks without consulting the documentation. This may be achieved in any of three ways. (i) The behaviour is intuitive, i.e. the behaviour of user-interface elements can be either anticipated or determined by a few experiments. An example of this is the rotation of the view, which is accomplished by simply dragging with the mouse. (ii) The behaviour is familiar and consistent, i.e. user-interface elements behave in a similar way to other common software. An example of this is the use of a ‘File’ menu containing ‘Open…’ options, leading to a conventional file-selection dialogue. (iii) The interface is explorable, i.e. if a user needs an additional functionality they can find it rapidly by inspecting the interface. An example of this is the use of organized menus which provide access to the bulk of the program functionality. Furthermore, tooltips are provided for most menus and buttons and informative widgets explain their function. 3.1. User interface The main Coot user interface window is shown in Fig. 2 ▶ and consists of the following elements. (i) In the centre of the main window is the three-dimensional canvas, on which the atomic models, maps and other graphical objects are displayed. By default this area has a black background, although this can be changed if desired. (ii) At the top of the window is a menu bar. This includes the following menus: ‘File’, ‘Edit’, ‘Calculate’, ‘Draw’, ‘Measures’, ‘Validate’, ‘HID’, ‘About’ and ‘Extensions’. The ‘File’, ‘Edit’ and ‘About’ menus fulfill their normal roles. ‘Calculate’ provides access to model-manipulation tools. ‘Draw’ implements display options. ‘Measures’ presents access to geometrical information. ‘Validate’ provides access to validation tools. ‘HID’ allows the human-interface behaviour to be customized. ‘Extensions’ provides access to a range of optional functionalities which may be customized and extended by advanced users. Additional menus can be added by the use of the scripting interface. (iii) Between the menu bar and the canvas is a toolbar which provides two very frequently used controls: ‘Reset view’ switches between views of the molecules and ‘Display Manager’ opens an additional window which allows individual maps and molecules to be displayed in different ways. This toolbar is customizable, i.e. additional buttons can be added. (iv) On the right-hand side of the window is a toolbar of icons which allow the modification of atomic models. By default these are displayed as icons, although tooltips are provided and text can also be displayed. (v) Below the canvas is a status bar in which brief text messages are displayed concerning the status of current operations. The user interface is implemented using the GTK+2 widget stack, although with some work this could be changed in the future. 3.2. Controls User input to the program is primarily via mouse and keyboard, although it is also possible to use some dial devices such as the ‘Powermate’ dial. The mouse may be used to select menu options and toolbar buttons in the normal way. In addition, the mouse and the keyboard may be used to manipulate the view in the three-dimensional canvas using the controls shown in Fig. 3 ▶. In a large program there is often tension between software being easy to learn and being easy to use. A program which is easy to use provides extensive shortcuts to allow common tasks to be performed with the minimum user input. Keyboard shortcuts, customizations and macro languages are common examples and are often employed by expert users of all types of software. Coot now provides tools for all of these. Much of the functionality of the package is now accessible from both the Python (http://www.python.org) and the Scheme (Kelsey et al., 1998 ▶) scripting languages, which may be used to construct more powerful tools using combinations of existing functions. One example is a function often used after molecular replace­ment which will step through every residue in a protein, replace any missing atoms, find the best-fitting side-chain rotamer and perform real-space refinement. This function is in turn bound to a menu item, although it would also be possible to bind it to a key on the keyboard. 3.3. Lighting model The lighting model used in Coot is a departure from the approach adopted in most molecular-graphics software. It is difficult to illustrate a three-dimensional shape in a two-dimensional representation of an object. The traditional approach is to use so-called ‘depth-cueing’: objects closer to the user appear more brightly lit and more distant objects are more like the background colour (usually darker). In the Coot model, however, the most brightly lit features are just forward of the centre of rotation. This innovation was accidental, but has been retained because it seemed to provide a more natural image and has generated positive feedback from users once they become accustomed to the new behaviour. It is now possible to offer an explanation for this result. Depth-cueing is an algorithm which adjusts the colours of graphical objects according to their distance from the viewer. Depth-cueing is used in several ways. When rendering outdoor scenes, it is used to wash out the colours of distant features to simulate the effect of light scattering in the intervening air. When rendering darkened scenes, the same effect can be used to darken distant objects in order to create the effect that the viewer is carrying a light source which illuminates nearer objects more brightly than distant ones. Note that both of these usages assume a ‘first-person’ view: the observer is placed within the three-dimensional environment. This is also borne out in the controls for manipulating the view: when the view is rotated, the whole environment usually rotates about the observer. However, fitting three-dimensional atomic models to X-­ray data is a different situation. It is not useful to place the observer inside the model and rotate the model around them, not least because the scientist is usually more interested in looking at the molecule or electron density from the outside. As a result, it is normal to rotate the view not about the observer but rather about the centre of the feature being studied. Since the central feature is of most interest, it helps the visualization if it is the brightest entity. To properly light the model in this way is relatively slow, so in Coot an approximation is used and the plane perpendicular to the viewer that contains the central feature is most brightly lit. 3.4. Atomic model Coot displays the atoms of the atomic models as points on the three-dimensional canvas. If the points are within bonding distance then a line symbolizing a bond is drawn between the atomic points; otherwise the atoms are displayed as crosses. By default the atoms are coloured by element, with carbon yellow, oxygen red, nitrogen blue, sulfur green and hydrogen white. Bonds have two colours, with one half corresponding to each connecting atom. Additional atomic models are distinguished by different colour coding. The colour wheel is rotated and the element colours are adjusted accordingly. However, there is an option to fix the colours for noncarbon elements and the colour-wheel position can be adjusted for each molecule individually. Furthermore, Coot allows the user to colour the atomic model by molecule, chain, secondary structure, B factor and occupancy. Besides showing atomic models, Coot can also display Cα (or backbone) chains only. Again the model can be coloured in different modes, by chain, secondary structure or with rainbow colours from the N-terminus to the C-terminus. Currently, Coot offers some additional atomic representations in the form of different bond-width or ball-and-stick representation for selected residues. Information about individual atoms can be visualized in the form of labels. These show the atom name, residue number, residue name and chain identifier. Labels are shown upon Shift + left mouse click or double left mouse click on an atom (the atom closest to the rotation/screen centre can be labelled using the keyboard shortcut ‘l’). This operation not only shows the label beside the atom in the three-dimensional canvas, but also gives more detailed information about the atom, including occupancy, B factor and coordinates, in the status bar. Symmetry-equivalent atoms of the atomic model can be displayed in Coot within a certain radius either as whole chains or as atoms within this radius. Different options for colouring and displaying atoms or Cα backbone are provided. The symmetry-equivalent models can be labelled as described above. Additionally, the label will provide information about the symmetry operator used to generate the selected model. Navigation around the atomic models is primarily achieved with a GUI (‘Go To Atom…’). This allows the view to be centred on a particular atom by selection of a model, chain ID, residue number and atom name. Buttons to move to the next or previous residue are provided and are also available via keyboard shortcuts (space bar and Shift space bar, respectively). Furthermore, each chain is displayed as an expandable tree of its residues, with atoms that can be selected for centring. Additionally, a mouse can be used for navigation, so a middle mouse click centres on the clicked atom. A keyboard shortcut for the view to be centred on a Cα atom of a specific residue is provided by the use of Ctrl-g followed by input of the chain identifier and residue number (terminated by Enter). All atomic models, in contrast to other display objects, are accessible by clicking a mouse button on an atom centre. This allows, for example, re-centring, selection and labelling of the model. 3.5. Electron density Electron-density maps are displayed using a three-dimensional mesh to visualize the surface of electron-density regions higher than a chosen electron-density value using a ‘marching-cubes’-type algorithm (Lorensen & Cline, 1987 ▶). The spacing of the mesh is dictated by the spacing of the grid on which the electron density is sampled. Since electron-density maps are most often described in terms of structure factors, the sampling can be modified by the user at the point where the electron density is read into the program. The contour level may be varied interactively using the scroll wheel on the mouse (if available) or alternatively by using the keyboard (‘+’ and ‘-’). In most cases this avoids the need for multiple contour levels to be displayed at once, although additional contour levels can be displayed if desired. The colour of the electron-density map may be selected by the user. By default, the first map read into the program is contoured in blue, with subsequent maps taking successive colour hues around a colour wheel. Difference maps are by default contoured at two levels, one positive and one negative (coloured green and red, respectively). The electron density is contoured in a box about the current screen centre and is interactively re-contoured whenever the view centre is changed. By default, this box covers a volume extending at least 10 Å in each direction from the current screen centre. This is an appropriate scale for manipulating individual units of a peptide or nucleotide chain and provides good interactive performance, even on older computers. Larger volumes may be contoured on faster machines. A ‘dynamic volume’ option allows the volume contoured to be varied with the current zoom level, so that the contoured region always fills the screen. A ‘dynamic sampling’ option allows the map to be contoured on a subsampled grid (e.g. every second or fourth point along each axis). This is useful when using a solvent mask to visualize the packing of the molecules in the crystal. 3.6. Display objects There are a variety of non-interactive display objects which can also be superimposed on the atomic model and electron density. These include the boundaries of the unit cell, an electron-density ridge trace (or skeleton), surfaces, three-dimensional text annotations and dots (used in the MolProbity interface). These cannot be selected, but aid in the visualization of features of the electron density and other entities. 3.7. File formats Coot recognizes a variety of file formats from which the atomic model and electron density may be read. The differences in the information stored in these various formats mean that some choices have to be made by the user. This is achieved by providing several options for reading electron density and, where necessary, by requesting additional information from the user. The file formats which may be used for atomic models and for electron density will be considered in turn. In addition to obtaining data from the local storage, it is also possible to obtain atomic models directly from the Protein Data Bank (Bernstein et al., 1977 ▶) by entering the PDB code of a deposited structure. Similarly, in the case of structures for which experimental data have been deposited, the model and phased reflections may both be obtained from the Electron Density Server (Kleywegt et al., 2004 ▶). 3.7.1. Atomic models Atomic models are read into Coot by selecting the ‘Open Coordinates…’ option from the File menu. This provides a standard file selector which may be used to select the desired file. Coot recognizes atomic models stored in the following three formats. (i) Protein Data Bank (PDB) format (with file extension .pdb or .ent; compressed files of this format with extension .gz can also be read). The latest releases provide compatibility with version 3 of the PDB format. (ii) Macromolecular crystallographic information file (mmCIF; Westbrook et al., 2005 ▶) format (extension .cif). (iii) SHELX result files produced by the SHELXL refinement software (extension .res or .ins). In each case, the unit-cell and space-group information are read from the file (in the case of SHELXL output the space group is inferred from the symmetry operators). The atomic model is read, including atom name, alternate conformation code, monomer name, sequence number and insertion code, chain name, coordinates, occupancy and isotropic/anisotropic atomic displacement parameters. PDB and mmCIF files are handled using the MMDB library (Krissinel et al., 2004 ▶), which is also used for internal model manipulations. 3.7.2. Electron density The electron-density representation is a significant element of the design of the software. Coot employs a ‘crystal space’ representation of the electron density, in which the electron density is practically infinite in extent, in accordance with the lattice repeat and cell symmetry of the crystal. Thus, no matter where the viewpoint is located in space density can always be represented. This design decision is achieved by use of the Clipper libraries (Cowtan, 2003 ▶). The alternative approach is to just display electron density in a bounded box described by the input electron-density map. This approach is simpler and may be more appropriate in some specific cases (e.g. when displaying density from cryo-EM experiments or some types of NCS maps). However, it has the limitation that no density is available for symmetry-related molecules and if the initial map has been calculated with the wrong extent then it must be recalculated in order to view the desired regions. This distinction is important in that it affects how electron-density data should be prepared for use in Coot. Files pre­pared for O or PyMOL may not be suitable for use in Coot. In order to read a map file into Coot, it should cover an asymmetric unit or unit cell. In contrast, map files prepared for O (Jones et al., 1991 ▶) or PyMOL (DeLano, 2002 ▶) usually cover a bounded box surrounding the molecule. While it is possible to derive any bounded box from the asymmetric unit, it is not always possible to go the other way; therefore, using map files prepared for other software may lead to unexpected results in some cases, the most common being an incorrect calculation of the standard deviation of the map. If one uses more advanced techniques that involve masking, the electron-density map must have the same symmetry as the associated model molecule. Electron density may be read into Coot either in the form of structure factors (with optional weights) and phases or alternatively in the form of an electron-density map. There are a number of reasons why the preferred approach is to read reflection data rather than a map. (i) Coot can always obtain a complete asymmetric unit of data, avoiding the problems described above. (ii) Structure-factor files are generally smaller than electron-density maps. (iii) Some structure-factor files, and in particular MTZ files, provide multiple sets of data in a single file. Thus, it is possible to read a single file and obtain, for example, both best and difference maps. The overhead in calculating an electron-density map by FFT is insignificant for modern computers. 3.7.3. Reading electron density from a reflection-data file Two options are provided for reading electron density from a reflection-data file. These are ‘Auto Open MTZ…’ and ‘Open MTZ, mmcif, fcf or phs…’ from the ‘File’ menu. (i) ‘Auto Open MTZ…’ will open an MTZ file containing coefficients for the best and difference map, automatically select the FWT/PHWT and the DELFWT/DELPHWT pairs of labels and display both electron-density maps. Currently, suitable files are generated by the following software: Phaser (Storoni et al., 2004 ▶), REFMAC (Murshudov et al., 1997 ▶), phenix.refine (Adams et al., 2002 ▶), DM (Zhang et al., 1997 ▶), Parrot (Cowtan, 2010 ▶), Pirate (Cowtan, 2000 ▶) and BUSTER (Blanc et al., 2004 ▶). (ii) ‘Open MTZ, mmcif, fcf or phs…’ will open a reflection-data file in any of the specified formats. Note that XtalView .phs files do not contain space-group and cell information: in these cases a PDB file must be read first to obtain the relevant information or the information has to be entered manually. MTZ files may contain many sets of map coefficients and so it is necessary to select which map coefficients to use. In this case the user is provided with an additional window which allows the map coefficients to be selected. The standard data names for some common crystallographic software are provided in Table 1 ▶. SHELX .fcf files are converted to mmCIF format and the space group is then inferred from the symmetry operators. 4. Model building Initial building of protein structures from experimental phasing is usually accomplished by automated methods such as ARP/wARP, RESOLVE (Wang et al., 2004 ▶) and Buccaneer (Cowtan, 2006 ▶). However, most of these methods rely on a resolution of better than 2.5 Å and yield more complete models the better the resolution. The main focus in Coot, therefore, is the completion of initial models generated by either molecular replacement or automated model building as well as building of lower resolution structures. However, the features described below are provided for cases where an initial model is not available. 4.1. Tools for general model building 4.1.1. Cα baton mode Baton building, which was introduced by Kleywegt & Jones (1994 ▶), allows a protein main chain to be built by using a 3.8 Å ‘baton’ to position successive Cα atoms at the correct spacing. In Coot, this facility is coupled with an electron-density ridge-trace skeleton (Greer, 1974 ▶). Firstly, a skeleton is calculated which follows the ridges of the electron density. The user then selects baton-building mode, which places an initial baton with one end at the current screen centre. Candidate positions for the next α-carbon are highlighted as crosses selected from those points on the skeleton which lie at the correct distance from the start point. The user can cycle through a list of candidate positions using the ‘Try Another’ button or alternatively rotate the baton freely by use of the mouse. Additionally, the length of the baton can be changed to accommodate moderate shifts in the α-­carbon positions. Once a new position is accepted, the baton moves so that its base is on the new α-carbon. In this way, a chain may be traced manually at a rate of between one and ten residues per minute. 4.1.2. Cα zone→main chain Having placed the Cα atoms, the rest of the main-chain atoms may be generated automatically. This tool uses a set of 62 high-resolution structures as the basis for a library of main-chain fragments. Hexapeptide and pentapeptide fragments are chosen to match the Cα positions of each successive pentapeptide of the Cα trace in turn, following the method of Esnouf (1997 ▶), which is similar to that of Jones & Thirup (1986 ▶). The fragments with the best fit to the candidate Cα positions are merged to provide a full trace. After this step, one typically performs a real-space refinement of the subsequent main-chain model. 4.1.3. Find secondary structure Protein secondary-structure elements, including α-helices and β-strands, can be located by their repeating electron-density features, which lead to high and low electron-density values in characteristic positions relative to the consecutive Cα atoms. The ‘Find Secondary Structure’ tool performs a six-dimensional rotation and translation search to find the likely positions of helical and strand elements within the electron density. This search has been highly optimized in order to achieve interactive performance for moderately sized structures and as a result is less exhaustive than the corresponding tools employed in automated model-building packages: however, it can provide a very rapid indication of map quality and a starting point for model building. 4.1.4. Place helix here At low resolution it is sometimes possible to identify secondary-structure features in the electron density when the Cα positions are not obvious. In this case, Coot can fit an α-helix automatically. This process involves several stages. (i) A local optimization is performed on the starting position to maximize the integral of the electron density over a 5 Å sphere. This tends to move the starting point close to the helix axis. (ii) A search is performed to obtain the direction of the helix by integrating the electron density in a cylinder of radius 2.5 Å and length 12 Å. A two-dimensional orientation search is performed to optimize the orientation of the cylinder. This gives the direction of the helix. (iii) A theoretical α-helical model (including C, Cα, N and O atoms) is placed in the density in accordance with the position and direction already found. Different rotations of the model around the helix axis must be considered. Each of the resulting models is scored by the sum of the density at the atomic centres. At this stage the direction of the helix is unknown and so both directions are tested. (iv) Next, a choice is made between the best-fitting models for each helix direction by comparing the electron density at the Cβ positions. In case neither orientation gives a significant better fit for the Cβ atoms, both helices are presented to the user. (v) Finally, attempts are made to extend the helix from the N- and C-termini using ideal ϕ, ψ values. 4.1.5. Place strand here A similar method is used for placing β-strand fragments in electron density. However, there are three differences compared with helix placement: firstly the initial step is omitted, secondly the length of the fragment (number of residues) needs to be provided by the user and finally the placed fragments are obtained from a database. The first step (optimizing the starting position) is unreliable for strands owing to the smaller radius of the cylinder, i.e. main chain, combined with larger density deviations originating from the side chains. Hence, it is omitted and the user must provide a starting position in this case. The integration cylinder used in determining the orientation of the strand has a radius of 1 Å and a length of 20 Å. The ϕ, ψ torsion angles in β-strands in protein deviate from the ideal values, resulting in curved and twisted strands. Such strands cannot be well modelled using ideal values of ϕ and ψ; therefore, candidate strand fragments corresponding to the requested length are taken from a strand ‘database’ (top100 or top500; Word, Lovell, LaBean et al., 1999 ▶) and used in the search. 4.1.6. Ideal DNA/RNA Coot has a function to generate idealized atomic structures of single or double-stranded A-­form or B-form RNA or DNA given a nucleotide sequence. The function is menu-driven and can produce any desired helical nucleic acid coordinates in PDB format with canonical Watson–Crick base pairing from a given input sequence with the click of a single button. Because most DNA and RNA structures are comprised of at least local regions of regular near-ideal helical structural elements, the ability to generate nucleic acid helical models on the fly is of particular value for molecular replacement. Recently, a collection of short ideal A-form RNA helical fragments generated within Coot were used to solve a structurally complex ligase ribozyme by molecular replacement (Robertson & Scott, 2008 ▶). Using Coot together with the powerful molecular-replacement program Phaser (Storoni et al., 2004 ▶) not only permitted this novel RNA structure to be solved without resort to heavy-atom methods, but several other RNA and RNA/protein complexes were also subsequently determined using this approach (Robertson & Scott, 2007 ▶). Since Coot and Phaser can be scripted using embedded Python components, an automated and integrated phasing system is amenable for development within the current software framework. 4.1.7. Find ligands The automatic fitting of ligands into electron-density maps is a frequently used technique that is particularly useful for pharmaceutical crystallographers (see, for example, Williams et al., 2005 ▶). The mechanism in Coot addresses a number of ligand-fitting scenarios and is a modified form of a previously described algorithm (Oldfield, 2001 ▶). It is common practice in ‘fragment screening’ to soak different ligands into the same crystal (Blundell et al., 2002 ▶). Using Coot one can either specify a region in space or search a whole asymmetric unit for either a single or a number of different ligand types. In the ‘whole-map’ scenario, candidate ligand sites are found by cluster analysis of a residual map. The candidate ligands are fitted in turn to each site (with the candidate orientations being generated by matching the eigenvectors of the ligand to that of the cluster). Each candidate ligand is fitted and scored against the electron density. The best-fitting orientation of the ligand candidates is chosen. Ligands often contain a number of rotatable bonds. To account for this flexibility, Coot samples torsion angles around these rotatable bonds. Here, each rotatable bond is sampled from an independent probability distribution. The number of conformers is under user control and it is recommended that ligands with a higher number of rotatable bonds should be allowed more conformer candidates. Above a certain number of rotatable bonds it is more efficient to use a ‘core + fragment by fragment’ approach (see, for example, Terwilliger et al., 2006 ▶). 4.2. Rebuilding and refinement The rebuilding and refinement tools are the primary means of model manipulation in Coot and are all grouped together in the ‘Model/Fit/Refine’ toolset. These tools may be accessed either through a toolbar (which is usually docked on the right-hand side of the main window) or through a separate ‘Model/Fit/Refine’ window containing buttons for each of the toolbar functions. The core of the rebuilding and refinement tools is the real-space refinement (RSR) engine, which handles the refinement of the atomic model against an electron-density map and the regularization of the atomic model against geometric restraints. Refinement may be invoked both interactively, when executed by the user, and non-interactively as part of some of the automated fitting tools. The refinement and regularization tools are supplemented by a range of additional tools aimed at assisting the fitting of protein chains. These features are discussed below. 4.3. Tools for moving existing atoms 4.3.1. Real-space refine zone The real-space refine tool is the most frequently used tool for the refinement and rebuilding of atomic models and is also incorporated as a final stage in a number of other tools, e.g. ‘Add Terminal Residue…’. In interactive mode, the user selects the RSR button and then two atoms bounding a range of monomers (amino acids or otherwise). Alternatively, a single atom can be selected followed by the ‘A’ key to refine a monomer and its neighbours. All atoms in the selected range of monomers will be refined, including any flanking residues. Atoms of the flanking residues are marked as ‘fixed’ but are required to be added to the refinement so that the geometry (e.g. peptide bonds, angles and planes) between fixed and moving parts is also optimized. The selected atoms are refined against a target consisting of two terms: the first being the atomic number (Z) weighted sum of the electron-density values over all the atomic centres and the second being the stereochemical restraints. The progress of the refinement is shown with a new set of atoms displayed in white/pale colours. When convergence is reached the user is shown a dialogue box with a set of χ2 scores and coloured ‘traffic lights’ indicating the current geometry scores in each of the geometrical criteria (Fig. 4 ▶). Additionally, a warning is issued if the refined range contains any new cis-peptide bonds. At this stage the user may adjust the model by selecting an atom with the mouse and dragging it, whereby the other atoms will move with the dragged atom. Alternatively, a single atom may be dragged by holding the Ctrl key. As soon as the atoms are released, the selected atoms will refine from the dragged position. Optionally, before the start of refinement atoms may be selected to be fixed during the refinement (in addition to the atoms of the flanking residues). 4.3.2. Sphere refinement One of the problems with the refinement mode described above is that it only considers a linear range of residues. This can cause difficulties, with some side chains being inappropriately refined into the electron density of neighbouring residues, particularly at lower resolutions. Additionally, a linear residue selection precludes the refinement of entities such as disulfide bonds. Therefore, a new residue-selection mechanism was introduced to address these issues: the so-called ‘Sphere Refinement’. This mode selects residues that have atoms within a given radius of a specified position (typically within 4 Å of the centre of the screen). The selected residues are matched to the dictionary and any user-defined links (typically from the mon_lib_list.cif in the REFMAC dictionary), e.g. disulfide bonds, glycosidic linkages and formylated lysines. If such links are found and the (supposedly) bonded atoms are within 3 Å of each other then these extra link restraints are added into the refinement. 4.3.3. Ramachandran restraints At lower resolution it is sometimes difficult to obtain an acceptable fit of the model to the density and at the same time achieve a Ramachandran plot of high quality (most residues in favourable regions and less than 1% outliers). If a Ramachandran score is added to the target function then the Ramachandran plot can be improved. The analytical form for torsion gradients (∂θ/∂x 1 and so on) for each of the x, y, z positions of the four atoms contributing to the torsion angle has been reported previously (Emsley & Cowtan, 2004 ▶) (in the case of Ramachandran restraints, the θ torsions will be ϕ and ψ). The extension of the torsion gradients for use as Ramachandran restraints is performed in the following manner. Firstly, two-dimensional log Ramachandran plots R are generated as tables (one for each of the residue types Pro, Gly and non-Pro or Gly). Where the Ramachandran probability becomes zero the log probability becomes infinite and so it is replaced by values which become increasingly negative with distance from the nearest nonzero value. This provides a weak gradient in the disallowed regions towards the nearest allowed region. The log Ramachandran plot provides the following values and derivatives: The derivative of R with respect to the coordinates is required for the addition into the target geometry and is generated as (and so on for each of the x, y, z positions of the atoms in the torsion). Adding a Ramachandran score to the geometry target function is not without consequences. The Ramachandran plot has for a long time been used as a validation criterion, therefore if it is used in geometry optimization it becomes less informative as a validation metric. Kleywegt & Jones (1996 ▶) included the Ramachandran plot in the restraints during refinement using X-PLOR (Brünger, 1992 ▶) and reported that the number of Ramachandran outliers was reduced by about a third using moderate force constants. However, increasing the force constants by over two orders of magnitude only marginally decreased the number of outliers. As a result, Kleywegt and Jones note that the Ramachandran plot retains significant value as a validation tool even when it is also used as a restraint. Using the Ramachandran restraints as implemented in Coot with the default weights, the number of out­liers can be reduced from around 10% to 5% (typical values). 4.3.4. Regularize zone The ‘Regularize Zone’ option functions in the same way as ‘Real-Space Refine Zone’ except that in this case the model is refined with respect to stereochemical restraints but without reference to any electron density. 4.3.5. Rigid-body fit zone The ‘Rigid-Body Fit Zone’ option also follows a similar interface convention to the other refinement options. A range of atoms are selected and the orientation of the selected group is refined to best fit the density. In this case the density is the only contributor to the target function, since the geometry of the fragment is not altered. No constraints are placed on the bonding atoms. If atoms are dragged after refinement, no further refinement is performed on the fragment. 4.3.6. Rotate/translate zone Using this tool, the selected residue selection can be translated and rotated either by dragging it around the screen or through the use of user-interface sliders. No reference to the map is made. The rotation centre can be specified to be either the last atom selected or the centre of mass of the fragment rotated. Additionally, a selection of the whole chain or molecule can be transformed. 4.3.7. Rotamer tools Four tools are available for the fitting of amino-acid side chains. For a side chain whose amino-acid type is already correctly assigned, the best rotamer may be chosen to fit the density either automatically or manually. If the automatic option is chosen then the side-chain rotamer from the MolProbity library (Lovell et al., 2000 ▶) which gives rise to the highest electron-density values at the atomic centres is selected and rigid-body refined (this includes the main-chain atoms of the residues). Otherwise, the user is presented with a list of rotamers for that side-chain type sorted by frequency in the database. The user can then scroll through the list of rotamers using either the keyboard or user-interface buttons to select the desired rotamer. Rotamers are named according to the MolProbity system. Briefly, the χ angles are given letters according to the torsion angle: ‘t’ for approximately 180°, ‘p’ for approximately 60° and ‘m’ for approximately −60° (Lovell et al., 2000 ▶). The other two options (‘Mutate & Auto Fit’ and ‘Simple Mutate’) allow the amino-acid type to be assigned or changed. The ‘Mutate & Auto Fit Rotamer’ option allows an amino-acid type to be selected from a list and then immediately performs the autofit rotamer operation as above. The ‘Simple Mutate’ option changes the amino-acid type and builds the side-chain atoms in the most frequently occurring rotamer without further refinement. 4.3.8. Torsion editing (‘Edit Chi Angles’, ‘Edit Backbone Torsions’, ‘Torsion General’) Coot has different tools for editing the main-chain and side-chain (or ligand) torsion angles. The main-chain torsion angles, namely ϕ and ψ, can be edited using ‘Edit Backbone Torsion…’. With two sliders, the peptide and carbonyl torsion angles can be adjusted. A separate window showing the Ramachandran plot with the two residues forming the altered peptide bond is displayed with the position of the residues updated as the angles change. Side-chain (or ligand) torsion angles must be defined prior to editing. Either the user manually defines the four atoms forming the torsion angle (‘Torsion General’) or the torsion angles are determined automatically and the user selects the one to edit. In the latter case the bond around which the selected torsion angle is edited is visually marked. Using the mouse, the angle can then be rotated freely. 4.3.9. Other protein tools (‘Flip peptide’, ‘Side Chain 180° Flip’, ‘Cis→Trans’) There are three other tools to perform common corrections to protein models. ‘Flip peptide’ rotates the planar atoms in a peptide group through 180° about the vector joining the bounding Cα atoms (Jones et al., 1991 ▶). ‘Side Chain 180° Flip’ rotates the last torsion of a side chain through 180° (e.g. to swap the OD1 and ND2 side-chain atoms of Asn). ‘Cis→Trans’ shifts the torsion of the peptide bond through 180°, thereby changing the peptide bond from trans to cis and vice versa. 4.4. Tools for adding atoms to the model 4.4.1. Find waters The water-finding mechanism in Coot uses the same cluster analysis as is used in ligand fitting. However, only those clusters below a certain volume (by default 4.2 Å3) are considered as candidate sites for water molecules. The centre of each cluster is computed and a distance check is then made to the potential hydrogen-bond donors or receptors in the protein molecule (or other waters). The distance criteria for acceptable hydrogen-bond length are under user control. Additionally, a test for acceptable sphericity of the electron density is performed. 4.4.2. Add terminal residue The MolProbity ϕ, ψ distribution is used to generate a set of randomly selected ϕ, ψ pairs. To build additional residues at the N- and C-termini of protein chains, the MolProbity ϕ, ψ distribution is used to generate a set of positions of the N, Cα, O and C atoms of the next two residues. The conformation of these new atoms is then scored against the electron-density map and recorded. This procedure is carried out a number of times (by default 100). The best-fitting conformation is offered as a candidate to the user (only the nearest of the two residues is kept). 4.4.3. Add alternate conformation Alternate conformations are generated by splitting the residue into two sets of conformations (A and B). By default all atoms of the residue are split, or alternatively only the Cα and side-chain atoms are divided. If the residue chosen is a standard protein residue then the rotamer-selection dialogue described above is also shown, along with a slider to specify the occupancy of the new conformation. 4.4.4. Place atom at pointer This is a simple interface to place a typed atom at the position of the centre of the screen. It can place additional water or solvent molecules in un­modelled electron-density peaks and is used in conjunction with the ‘Find blobs’ tool, which allows the largest unmodelled peaks to be visited in turn. 4.5. Tools for handling noncrystallographic symmetry (NCS) Noncrystallographic symmetry (NCS) can be exploited during the building of an atomic model and also in the analysis of an existing model. Coot provides five tools to help with the building and visualization of NCS-related molecules. (i) NCS ghost molecules. In order to visualize the simi­larities and differences between NCS-related molecules, a ‘ghost’ copy of any or all NCS-related chains may be superimposed over a specific chain in the model. The ‘ghost’ copies are displayed in thin lines and coloured differently, as well as uniformly, in order to distinguish them from the original. The superposition may be performed automatically by secondary-structure matching (Krissinel & Henrick, 2004 ▶) or by least-squares superposition. An example of an NCS ghost molecule is shown in Fig. 5 ▶. (ii) NCS maps. The electron density of NCS-related molecules can be superimposed in order to allow differences in the electron density to be visualized. This is achieved by transforming the coordinates of the three-dimensional contour mesh, rather then the electron density itself, in order to provide good interactive performance. The operators are usually determined with reference to an existing atomic model which obeys the same NCS relationships. An example of an NCS map is shown in Fig. 6 ▶. (iii) NCS-averaged maps. In addition to viewing NCS-related copies of the electron density, the average density of the related regions may be computed and viewed. In noisy maps this can provide a clearer starting point for model building. (iv) NCS rebuilding. When building an atomic model of a molecule with NCS, it is often more convenient to work on one chain and then replicate the changes made in every NCS-related copy of that chain (at least in the early stages of model building). This can be achieved by selecting two related chains and replacing the second chain in its entirety, or in a specific residue range, with an NCS-transformed copy of the first chain. (v) NCS ‘jumping’. The view centre jumps to the next NCS-related peer chain and at the same time the NCS operators are taken into account so that the relative view remains the same. This provides a means for rapid visual comparison of NCS-related entities. 5. Validation Coot incorporates a range of validation tools from the com­parison of a model against electron density to comprehensive geometrical checks for protein structures and additional tools specific to nucleotides. It also provides convenient interfaces to external validation tools: most notably the MolProbity suite (Davis et al., 2007 ▶), but also to the REFMAC refinement software (Murshudov et al., 1997 ▶) and dictionary (Vagin et al., 2004 ▶). Many of the internal validation tools provide a uniform interface in the form of colour-coded bar charts, for example the ‘Density Fit Analysis’ chart (Fig. 7 ▶). This window contains one bar chart for each chain in the structure. Each chart contains one bar for each residue in the chain. The height and colour of the bar indicate the model quality of the residue, with small green bars indicating a good or expected/conventional conformation and large red bars indicating poor-quality or ‘unconventional’ residues. The chart is active, i.e. on moving the pointer over the bar tooltips provide relevant statistics and clicking on a bar changes the view in the main graphics window to centre on the selected residue. In this way, a rapid overview of model quality is obtained and problem areas can be investigated. In order to obtain a good structure for sub­mission, the user may simply cycle though the validation options, correcting any problems found. The available validation tools are described in more detail in the following sections. 5.1. Ramachandran plot The Ramachandran plot tool (Fig. 8 ▶) launches a new window in which the Ramachandran plot for the active molecule is displayed. A data point appears in this plot for each residue in the protein, with different symbols distinguishing Gly and Pro residues. The background of the plot shows frequency data for Ramachandran angles using the Richardsons’ data (Lovell et al., 2003 ▶). The plot is interactive: clicking on a data point moves the view in the three-dimensional canvas to centre on the corresponding residue. Similarly, selecting an atom in the model highlights the corresponding data point. Moving the mouse over a data point corresponding to a Gly or Pro residue causes the Ramachandran frequency data for that residue type to be displayed. 5.2. Kleywegt plot The Kleywegt plot (Kleywegt, 1996 ▶; Fig. 9 ▶) is a variation of the Ramachandran plot that is used to highlight NCS differences between two chains. The Ramachandran plot for two chains of the protein is displayed, with the data points of NCS-related residues in the two chains linked by a line for the top 50 (default) most different ϕ, ψ angles. Long lines in the corresponding figure correspond to significant differences in backbone conformation between the NCS-related chains. 5.3. Incorrect chiral volumes Dictionary definitions of monomers can contain descriptions of chiral centres. The chiral centres are described as ‘positive’, ‘negative’ or ‘both’. Coot can compare the residues in the protein structure to the dictionary and identify outliers. 5.4. Unmodelled blobs The ‘Unmodelled Blobs’ tool finds candidate ligand-binding sites (as described above) without trying to fit a specific ligand. 5.5. Difference-map peaks Difference maps can be searched for positive and negative peaks. The peak list is then sorted on peak height and filtered by proximity to higher peaks (i.e. only peaks that are not close to previous peaks are identified). 5.6. Check/delete waters Waters can be validated using several criteria, including distance from hydrogen-bond donors or acceptors, temperature factor or electron-density level. Waters that do not pass these criteria are identified and presented as a list or automatically deleted. 5.7. Check waters by difference map variance This tool is used to identify waters that have been placed in density that should be assigned to other atoms or molecules. The difference map at each water position is analysed by generating 20 points on each sphere at radii of 0.5, 1.0 and 1.5 Å and the electron-density level at each of these points is found by cubic interpolation. The mean and variance of the density levels is calculated for each set of points. If, for example, a water was misplaced into the density for a glycerol then (given an isotropic density model for the water molecule) the difference map will be anisotropic because there will be unaccounted-for positive density along the bonds to the other atoms in the glycerol. There may also be some negative density in a perpendicular direction as the refinement program tries to compensate for the additional electron density. The variances are summed and compared with a reference value (by default 0.12 e2 Å−6). Note that it only makes sense to run this test on a difference map generated by reciprocal-space refinement (for example, from REFMAC or phenix.refine) that included temperature-factor refinement. 5.8. Geometry analysis The geometry (bonds, angles, planes) for each residue in the selected molecule is compared with dictionary values (typically provided by the mmCIF REFMAC dictionary). Torsion-angle deviations are not analysed (as there are other validation tools for these; see §5.9). The statistic displayed in the geometry graph is the average Z value for each of the geometry terms for that residue (peptide-geometry distortion is shared between neighbouring residues). The tooltip on the geometry graph describes the geometry features giving rise to the highest Z value. 5.9. Peptide ω analysis This is a validation tool for the analysis of peptide ω torsion angles. It produces a graph marking the deviation from 180° of the peptide ω angle. The deviation is assigned to the residue that contains the C and O atoms of the peptide link, thus peptide ω angles of 90° are very poor. Optionally, ω angles of 0° can be considered ideal (for the case of intentional cis-peptide bonds). 5.10. Temperature-factor variance analysis The variance of the temperature factors for the atoms of each residue is plotted. This is occasionally useful to highlight misbuilt regions. In a badly fitting residue, reciprocal-space refinement will tend to expand the temperatures factors of atoms in low or negative density, resulting in a high variance. However, residues with long side chains (e.g. Arg or Lys) often naturally have substantial variance, even though the atoms are correctly placed, which causes ‘noise’ in this graph. This shortcoming will be addressed in future developments. H atoms are ignored in temperature-factor variance analysis. 5.11. Gln and Asn B-factor outliers This is another tool that analyses the results of reciprocal-space refinement. A measure z is computed that is half of the difference of the temperature factor between the NE2 and OE1 atoms (in the case of Gln) divided by the standard deviation of the temperature factors of the remaining atoms in the residue. Our analysis of high-resolution structures has shown that when z is greater than +2.25 there is a more than 90% chance that OE1 and NE2 need to be flipped (P. Emsley, unpublished results). 5.12. Rotamer analysis The rotamer statistics are generated from an analysis of the nearest conformation in the MolProbity rotamer probability distribution (Lovell et al., 2000 ▶) and displayed as a bar chart. The height of the bar in the graph is inversely proportional to the rotamer probability. 5.13. Density-fit analysis The bars in the density-fit graphs are inversely proportional to the average Z-weighted electron density at the atom centres and to the grid sampling of the map (i.e. maps with coarser grid sampling will have lower bars than a more finely gridded map, all other things being equal). Accounting for the grid sampling allows lower resolution maps to have an informative density-fit graph without many or most residues being marked as worrisome owing to their atoms being in generally low levels of density. 5.14. Probe clashes ‘Probe Clashes’ is a graphical representation of the output of the MolProbity tools Reduce (Word, Lovell, Richardson et al., 1999 ▶), which adds H atoms to a model (and thereby provides a means of analyzing potential side-chain flips), and Probe (Word, Lovell, LaBean et al., 1999 ▶), which analyses atomic packing. ‘Contact dots’ are generated by Probe and these are displayed in Coot and coloured by the type of interaction. 5.15. NCS differences The graph of noncrystallographic symmetry differences shows the r.m.s. deviation of atoms in residues after the transformation of the selected chain to the reference chain has been applied. This is useful to highlight residues that have unusually large differences in atom positions (the largest differences are typically found in the side-chain atoms). 6. Model analysis 6.1. Geometric measurements Geometric measurements can be performed on the model and displayed in a three-dimensional view using options from the ‘Measures’ menu. These measurements include bond lengths, bond angles and torsion angles, which may be selected by clicking successively on the atoms concerned. It is also possible to measure the distance of an atom to a least-squares plane defined by a set of three or more other atoms. The ‘Environment Distances’ option allows all neighbours within a certain distance of any atom of a chosen residue to be displayed. Distances between polar neighbours are coloured differently to all others. This is particularly useful in the initial analysis of hydrogen bonding. 6.2. Superpositions It is often useful to compare several related molecules which are similar in terms of sequence or fold. In order to do this the molecules must be placed in the same position and orientation in space so that the differences may be clearly seen. Two tools are provided for this purpose. (i) SSM superposition (Krissinel & Henrick, 2004 ▶). Secondary Structure Matching (SSM) is a tool for superposing proteins whose fold is related by fitting the secondary-structure elements of one protein to those of the other. This approach is automatic and does not rely on any sequence identity between the two proteins. The superposition may include a complete structure or just a single chain. (ii) LSQ superposition. Least-squares (LSQ) superposition involves finding the rotation and translation which minimizes the distances between corresponding atoms in the two models and therefore depends on having a predefined correspondence between the atoms of the two structures. This approach is very fast but requires that a residue range from one structure be specified and matched to a corresponding residue range in the other structure. 7. Interaction with other programs In addition to the built-in tools, e.g. for refinement and validation, Coot provides interfaces to external programs. For refinement, interfaces to REFMAC and SHELXL are pro­vided. Validation can be accomplished by interaction with the programs Probe and Reduce from the MolProbity suite. Furthermore, interfaces for the production of publication-quality figures are provided by communication with the (molecular) graphics programs CCP4mg, POV-Ray and Raster3D. 7.1. REFMAC  Coot provides a dialogue similar to that used in CCP4i for running REFMAC (Murshudov et al., 2004 ▶). REFMAC is a program from the CCP4 suite for maximum-likelihood-based macromolecular refinement. Once a round of interactive model building has finished, the user can choose to use REFMAC to refine the current model. Reflections for the refinement are either used from the MTZ file from which the currently displayed map was calculated or can be acquired from a selected MTZ file. Most REFMAC parameters are set as defaults; however, some can be specified in the GUI, such as the number of refinement cycles, twin refinement and the use of NCS. Once REFMAC has terminated, the newly generated (refined) model and MTZ file from which maps are generated are automatically read in (and displayed). If REFMAC detected geometrical outliers at the end of the refinement, an interactive dialogue will be presented with two buttons for each residue containing an outlier: one to centre the view on the residue and the other to carry out real-space refinement. 7.2. SHELXL  For high-resolution refinement, SHELXL can be used directly from Coot. A new SHELXL.ins file can be generated from a SHELXL.res file including any manipulations or additions to the model. Additional parameters may be added to the file or it can be edited in a GUI. Once refinement in SHELXL is finished, the refined coordinate file is read in and displayed. The resulting reflections file (.fcf) is converted into an mmCIF file, after which it is read in and the electron density is displayed. An interactive dialogue of geometric out­liers (disagreeable restraints and other problems discovered by SHELXL) can be displayed by parsing the .lst output file from SHELXL. 7.3. MolProbity  Coot interacts with programs and data from the Mol­Probity suite in a number of ways, some of which have already been described. In addition, MolProbity can provide Coot with a list of possible structural problems that need to be addressed in the form of a ‘to-do chart’ in either Python or Scheme format; this can be read into Coot (‘Calculate’→‘Scripting…’). 7.4. CCP4mg  Coot can write CCP4mg picture-definition files (Potterton et al., 2004 ▶). These files are human-readable and editable and define the scene displayed by CCP4mg. Currently, the view and all displayed coordinate models and maps are described in the Coot-generated definition file. Hence, the displayed scene in Coot when saving the file is identical to that in CCP4mg after reading the picture-definition file. For convenience, a button is provided which will automatically produce the picture-definition file and open it in CCP4mg. 7.5. Raster3D/POV-Ray  Raster3D (Merritt & Bacon, 1997 ▶) and POV-Ray (Persistence of Vision Pty Ltd, 2004 ▶) are commonly used programs for the production of publication-quality figures in macromolecular crystallography. Coot writes input files for both of these programs to display the current view. These can then be rendered and ray-traced by the external programs either externally or directly within Coot using ‘default’ parameters. The resulting images display molecular models in ball-and-stick representation and electron densities as wire frames. 8. Scripting Most internal functions in Coot are accessible via a SWIG (Simplified Wrapper and Interface Generator) interface to the scripting languages Python (http://www.python.org) and Guile (a Scheme interpreter; Kelsey et al., 1998 ▶; http://www.gnu.org/software/guile/guile.html). Via the same interface, some of Coot’s graphics widgets are available to the scripting layer (e.g. the main menu bar and the main toolbar). The availability of two scripting interfaces allows greater flexibility for the user as well as facilitating the interaction of Coot with other applications. In addition to the availability of Coot’s internal functions, the scripting interface is enriched by a number of provided scripts (usually available in both scripting languages). Some of these scripts use GUIs, either through use of the Coot graphics widgets or via the GTK+2 extensions of the scripting lan­guages. A number of available scripts and functions are made available in an extra ‘Extensions’ menu. Scripting not only provides the user with the possibility of running internal Coot functions and scripts but also that of reading and writing their own scripts and customizing the menus. 9. Building and testing When Coot was made available to the public, three initial considerations were that it should be cross-platform, robust and easy to install. These considerations continue to be a challenge. To assist in meeting them, an automated scheduled build-and-test system has been developed, thus enabling almost constant deployment of the pre-release software. The subversion version-control system (http://svnbook.red-bean.com/) is used to manage source-code revisions. An ‘integration machine’ checks out the latest source code several times per hour, compiles the software and makes a source-code tar file. Less frequently, a heterogeneous array of build machines copies the source tar file and compiles it for the host architecture. After a successful build, the software is run against a test suite and only if the tests are passed is the software bundled and made available for download from the web site. All the build and test logs are made available on the Coot web site. Fortunately, users of the pre-release code seem to report problems without undue exasperation. It is the aim of the developers to respond rapidly to such reports. 9.1. Computer operating-system compatibility Coot is released under the GNU General Public License (GPL) and depends upon many other GPL and open-source software components. Coot’s GUI and graphical display are based on rather standard infrastructure, including the X11 windowing system, OpenGL and associated software such as the cross-platform GTK+2 stack derived from the GIMP project. In addition, Coot depends upon open-source crystallographic software components including the Clipper libraries (Cowtan, 2003 ▶), the MMDB library (Krissinel et al., 2004 ▶), the SSM library (Krissinel & Henrick, 2004 ▶) and the CCP4 libraries. In principle, Coot and its dependencies can be in­stalled on any modern GNU/Linux or Unix platform without fanfare. A Windows-based version of Coot is also available. 9.2. Coot on GNU/Linux Compiling and installing Coot on the GNU/Linux operating system is probably the most straightforward option. GNU/Linux is in essence a free software/open-source collaborative implementation of the Unix operating system that is compatible with most computer hardware. Coot’s infrastructural dependencies, such as GTK+2 and other GNU libraries, as well as all of its crystallographic software dependencies, were selected with portability in mind. Most of the required dependencies are either installed with the GNOME desktop or are readily available for installation via the package-management systems specific to each distribution. It is possible that in future Coot (along with all its dependencies) will be made available via the official package-distribution systems for several of the major GNU/Linux distributions. When an end-user chooses to install the Coot package, all of Coot’s required dependencies will be installed along with it in a simple and painless procedure. An official Coot package currently exists in the Gentoo distribution (maintained by Donnie Berkholz), a Fedora package (maintained by Tim Fenn) is under development at the time of writing and unofficial Debian and rpm Coot packages are also available. Binary Coot releases for the most popular GNU/Linux platforms are available from the Coot website: http://www.ysbl.york.ac.uk/~emsley/software/binaries/. Additional information on installing Coot on GNU/Linux, either as a pre-compiled binary or from source code, is available on the Coot wiki: http://strucbio.biologie.uni-konstanz.de/ccp4wiki/index.php/COOT. 9.3. Coot on Apple’s Mac OS X With the release of Apple’s Mac OS X, a Unix-based operating system, it became possible to use most if not all of the standard crystallographic software on Apple computers. OS X does not natively use the X11 windowing system, but rather a proprietary windowing technology called Quartz. This system has some benefits over X11, but does not support X11-based Unix software. However, the X11 windowing system can be run within OS X (in rootless mode) and as of OS X version 10.5 this has become a default option and operates in a reasonably seamless manner. Unlike GNU/Linux, Apple does not provide the X11-based dependencies (GTK+2, GNOME libraries) and many of the other open-source components required to install and run Coot. However, third-party package-management systems have appeared to fill this gap, having made it their mission to port essentially all of the most important software that is freely available to users of other Unix-based systems to OS X. The two most popular package-management systems are Fink and MacPorts. Of these, Fink makes available a larger collection of software that is of use to scientists, including a substantial collection of crystallographic software. For that reason, Fink has been adopted as the preferred option for installing Coot on Mac OS X. Fink uses many of the same software tools as the Debian GNU/Linux package-management system and provides a convenient front-end. In practice, this requires the end user to do three things in preparation for installing Coot under OS X. (i) Install Apple’s X-code Developer tools. This is a free gigabyte-sized download available from Apple. (ii) Install the very latest version of X11. This is crucial, as many bug fixes are required to run Coot. (iii) Install the third-party package-management system Fink and enable the ‘unstable’ software tree to obtain access to the latest software. Coot may then be installed through Fink with the command fink install coot. 9.4. Coot on Microsoft Windows Since Microsoft Windows operating systems are the most widely used computer platform, a Coot version which runs on Microsoft Windows has been made available (WinCoot). All of Coot’s dependencies compile readily on Windows systems (although some require small adjustments) or are available as GPL/open-source binary downloads. The availability of GTK+2 (dynamically linked) libraries (DLLs) for Windows makes it possible to compile Coot without the requirement of the X11 windowing system, which would depend on an emulation layer (e.g. Cygwin). Some minor adjustments to Coot itself were necessary owing to differences in operating-system architecture, e.g. the filesystem (Lohkamp et al., 2005 ▶). Currently WinCoot, by default, only uses Python as a scripting language since the Guile GTK+2 extension module is not seen as robust enough on Windows. WinCoot binaries are, as for GNU/Linux systems, automatically built and tested on a regular basis. The program is executed using a batch script and has been shown to work on Windows 98, NT, 2000, XP and Vista. WinCoot binaries (stable as well as pre-releases) are available as a self-extracting file from http://www.ysbl.york.ac.uk/~lohkamp/coot/. 10. Discussion Coot tries to combine modern methods in macromolecular model building and validation with concerns about a modern GUI application such as ease of use, productivity, aesthetics and forgiveness. This is an ongoing process and although improvements can still be made, we believe that Coot has an easy-to-learn intuitive GUI combined with a high level of crystallographic awareness, providing useful tools for the novice and experienced alike. However, Coot has a number of limitations: NCS-averaged maps are poorly implemented, being meaningful only over a limited part of the unit cell (or crystal). There is also a mis­match in symmetry when using maps from cryo-EM data (Coot incorrectly applies crystal symmetry to EM maps). Coot is not at all easy to compile, having many dependencies: this is a problem for developers and advanced users. 10.1. Future Coot is under constant development. New features and bug fixes are added on an almost daily basis. It is anticipated that further tools will be added for validation, nucleotide and carbohydrate model building, as well as for refinement. Interactive model building will be enhanced by communication with the CCP4 database, use of annotations and an interactive notebook and by adding annotation representation into the validation graphs. The embedded scripting languages provide the potential for sophisticated communication with model-building tools such as Buccaneer, ARP/wARP and PHENIX; in future this may be extended to include density modification as well. In the longer term tools to handle EM maps are planned, including the possibility of building and refining models. The appropriate data structures are already implemented in the Clipper libraries but are not yet available in Coot. The integration of validation tools will be expanded, especially with respect to MolProbity, and an interface to the WHAT_CHECK validation program (Hooft et al., 1996 ▶) will be added. WHAT_CHECK provides machine-readable output and this can be read by Coot to provide both an interactive description and navigation as well as (requiring more work) a mode to automatically fix up problematic geometry. Note added in proof: Ian Tickle has noted a potential problem with the calculation of χ2 values resulting from real-space refinement. Coot will be reworked to instead represent the r.m.s. deviation from ideality of each of the geometrical terms.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            XDS

            1. Functional specification The program package XDS (Kabsch, 1988a ▶,b ▶, 1993 ▶, 2010 ▶) was developed for the reduction of single-crystal diffraction data recorded on a planar detector by the rotation method using monochromatic X-rays. It includes a set of three programs. XDS accepts a sequence of adjacent non-overlapping rotation images from a variety of imaging-plate, CCD, pixel and multiwire area detectors, infers crystal symmetry and metrics and produces a list of corrected integrated intensities of the reflections occurring in the images in a nearly automatic way. The program assumes that each image covers the same positive amount of crystal rotation and that the rotation axis, incident beam and crystal intersect at one point, but otherwise imposes no limitations on the detector position, on the directions of the rotation axis and incident beam or on the oscillation range covered by each image. XSCALE places the data sets obtained from processing with XDS on a common scale, optionally merges them into one or several sets of unique reflections and reports their completeness and the quality of the integrated intensities. It corrects the data for absorption effects, sensitivity variations in the detector plane and radiation damage. Optionally, it can correct reflections individually for radiation damage by extrapolation to their initial intensities at zero dose. XDSCONV converts reflection data files as obtained from XDS or XSCALE into various formats required by software packages for crystal structure determination. It can generate test reflections or inherit previously selected ones which are used for the calculation of a free R factor to monitor the progress of structure refinement. 2. XDS XDS is organized into eight steps (major subroutines) which are called in succession by the main program. Information is exchanged between the steps by files (see Table 1 ▶), which allows the repetition of selected steps with a different set of input parameters without rerunning the whole program. The files generated by XDS are either ASCII-type files that can be inspected and modified using a text editor or binary control images saved as a byte-offset variant of the CBFlib format (Bernstein & Hammersley, 2005 ▶; Bernstein & Ellis, 2005 ▶). Such images are indicated by the file-name extension .cbf and can be looked at using the open-source program XDS-Viewer written by Michael Hoffer. All files have a fixed name defined by XDS, which makes it mandatory to process each data set in a newly created directory in order to avoid name clashes. Clearly, one should not run more than one XDS job at a time in the same directory. Output files affected by rerunning selected steps (see Table 1 ▶) should also first be given another name if their original contents are meant to be saved. Data processing begins by copying an appropriate input file into the new directory. Input-file templates are provided with the XDS package for a number of frequently used data-collection facilities. The copied input file must be renamed XDS.INP and edited to provide the correct parameter values for the actual data-collection experiment. All parameters in XDS.INP are named by keywords containing an equals sign as the last character and many of them will be mentioned here in context in order to clarify their meaning. Execution of XDS (JOB=XDS) invokes each of the eight program steps as described below. The results and diagnostics from each step are saved in files with the extension .LP attached to the program-step name. These files should always be studied carefully to see whether processing was satisfactory or, in the case of failure, to find out what could have gone wrong. 2.1. XYCORR This program step calculates a lookup table of additive spatial corrections at each detector pixel which is stored in the files X-CORRECTIONS.cbf, Y-CORRECTIONS.cbf. Often, the data images have already been corrected for geometrical distortions, in which case XYCORR produces tables of zeros. For spiral read-out imaging-plate detectors the small corrections resulting from radial (ROFF=) and tangential (TOFF=) offset errors of the scanner are computed. For some multiwire and CCD detectors that deliver geo­metrically distorted images, corrections are derived from a calibration image (BRASS_PLATE_IMAGE=file name). This image displays the response to a brass plate containing a regular grid of holes which is mounted in front of the detector and illuminated by an X-ray point source. Clearly, the source must be placed exactly at the location to be occupied by the crystal during the actual data collection, as photons emanating from the calibration source are meant to simulate all possible diffracted beam directions. For visual control, spots that have been located and accepted from the brass-plate image by XYCORR are marked in the file FRAME.cbf. The following problems can be encountered in this step. (i) A misplaced calibration source can lead to an incorrect lookup table, impairing the correct prediction of the observed diffraction pattern in subsequent program steps. (ii) An underexposed calibration image can result in an incomplete and unreliable list of calibration spots. 2.2. INIT INIT determines three lookup tables, saved as the files BLANK.cbf, GAIN.cbf and BKGINIT.cbf, that are required by the subsequent processing steps for classifying pixels in the data images as background or belonging to a diffraction spot (‘strong’ pixels). These tables should be inspected with the XDS-Viewer program. BLANK.cbf contains a lookup table of the detector noise. It is determined from a specific image recorded in the absence of X-rays (DARK_CURRENT_IMAGE=) or is assumed to be a constant derived from the mean recorded value in each corner of the data images. GAIN.cbf codes for the expected variation of the pixel contents in the background region of a data image. The variance of the contents of a pixel in the background region is GAIN·(pixel contents − detector noise). The variance is determined from the scatter of pixel values within a rectangular box (NBX=, NBY=) of size (2·NBX + 1)·(2·NBY + 1) centred at each image pixel in succession. The table GAIN.cbf is used to distinguish background pixels from ‘strong’ pixels that are part of a diffraction spot. BKGINIT.cbf estimates the initial background at each pixel from a few data images specified by the input parameter BACKGROUND_RANGE=. The lookup table is obtained by adding the X-ray background from each image. Shaded regions on the detector (i.e. from the beamstop), pixels outside a user-defined circular region (TRUSTED_REGION=) or pixels with an undefined spatial correction value are classified as untrustworthy and marked by −3. The following problem can be encountered in this step. Some detectors with insufficient protection from electromagnetic pulses may generate badly spoiled images whose inclusion leads to a completely wrong X-ray background table. These images can be identified in INIT.LP by their un­expected high mean pixel contents and this step should be repeated with a different set of images. 2.3. COLSPOT COLSPOT locates strong diffraction spots occurring in a subset of the data images and saves their centroids in the file SPOT.XDS. The data subset is defined by contiguous image number ranges, where each range is specified by the keyword SPOT_RANGE=. As described in Kabsch (2010 ▶), spots are defined as sets of ‘strong’ pixels that are adjacent in three dimensions. The classification of ‘strong’ pixels is controlled by the decision constants STRONG_PIXEL= and BACKGROUND_PIXEL=. If the total number of ‘strong’ pixels occurring in the specified data images exceeds the upper limit as given by the input parameter MAXIMUM_NUMBER_OF_STRONG_PIXELS=, the weaker ones are discarded. A spot is accepted if it contains a minimum number of ‘strong’ pixels (MINIMUM_NUMBER_OF_PIXELS_IN_A_SPOT=) and if the spot centroid is sufficiently close to the location of the strongest pixel in the spot (SPOT_MAXIMUM-CENTROID=). The following problem can be encountered in this step. Sharp edges such as ice rings in the images can lead to an excessive number of ‘strong’ pixels being erroneously classified as contributing to diffraction spots. These aliens could prevent IDXREF from recognizing the crystal lattice. 2.4. IDXREF IDXREF uses the initial parameters describing the diffraction experiment as provided by XDS.INP and the observed centroids of the spots from the file SPOT.XDS to find the orientation, metric and symmetry of the crystal lattice and refines all or a specified subset of these parameters [input parameter REFINE(IDXREF)=] . On return, the complete set of parameters are saved in the file XPARM.XDS and the original file SPOT.XDS is replaced by a file of identical name, now with indices attached to each observed spot. Spots not belonging to the crystal lattice are given indices 0, 0, 0. XDS considers the run to be successful if the coordinates of at least 70% of the given spots can be explained with reasonable accuracy (input parameter MAXIMUM_ERROR_OF_SPOT_POSITION=); otherwise, XDS will stop with an error message. Alien spots often arise because of the presence of ice or small satellite crystals and continuation of data processing may still be meaningful. In this case, XDS is called again with an explicit list of the subsequent steps specified in XDS.INP (input parameter JOB=DEFPIX XPLAN INTEGRATE CORRECT). IDXREF uses the methods described in Kabsch (1993 ▶, 2010 ▶) to determine a crystal lattice that explains the observed locations of the diffraction spots listed in the file SPOT.XDS. Firstly, a reciprocal-lattice vector referring to the unrotated crystal is computed from each observed spot centroid. Differences between any two reciprocal-lattice vectors that are above a specified minimal length (SEPMIN=) are accumulated in a three-dimensional histogram. These difference vectors will form clusters in the histogram, since there are many different pairs of reciprocal-lattice vectors of nearly identical vector difference. The clusters are found as maxima in the smoothed histogram (CLUSTER_RADIUS=) and a basis of three linearly independent cluster vectors is selected that allows all other cluster vectors to be expressed as nearly integral multiples of small magnitude with respect to this basis. The basis vectors and the 60 most populated clusters with attached indices are listed in IDXREF.LP. If many of the indices deviate significantly from integral values, the program is unable to find a reasonable lattice basis and all further processing will be meaningless. If the space group and unit-cell parameters are specified, a reduced cell is derived and the reciprocal-basis vectors found above are reinterpreted accordingly; otherwise, a reduced cell is determined directly from the reciprocal basis. The parameters of the reduced cell, the coordinates of the reciprocal-basis vectors and their indices with respect to the reduced cell are reported. Based on the orientation and metric of the reduced cell now available, IDXREF indexes up to 3000 of the strongest spots using the local-indexing method. This method considers each spot as a node of a tree and identifies the largest subtree of nodes which can be assigned reliable indices. The number of reflections in the ten largest subtrees is reported and usually shows a dominant first tree corresponding to a single lattice, whereas alien spots are found in small subtrees. Reflections in the largest subtree are used for initial refinement of the basis vectors of the reduced cell, the incident-beam wavevector and the origin of the detector, which is the point in the detector plane nearest to the crystal. Experience has shown that the detector origin and the direction of the incident beam are often specified with insufficient accuracy, which could easily lead to a misindexing of the reflections by a constant offset. For this reason, IDXREF considers alternative choices for the index origin and reports their likelihood of being correct. The parameters controlling the local indexing are INDEX_ERROR=, INDEX_MAGNITUDE=, INDEX_QUALITY= (corresponding to ∊, ϕ and 1 − ℓmin in Kabsch, 2010 ▶) and INDEX_ORIGIN=h 0, k 0, l 0, which is added to the indices of all reflections in the tree. After initial refinement based on the reflections in the largest subtree, all spots which can now be indexed are included. Usually, the detector distance and the direction of the rotation axis are not refined, but if the spots were extracted from images covering a large range of total crystal rotation then better results are obtained by including these parameters in the refinement [REFINE(IDXREF)=] . The refined metric parameters of the reduced cell are used to test each of the 44 possible lattice types as described in Kabsch (2010 ▶). For each lattice type, IDXREF reports the likelihood of its being correct and the conventional unit-cell parameters. The program step concludes with an overview of possible lattice symmetries, but makes no automatic decision for the space group. If the crystal symmetry is unknown, XDS will continue data processing with the crystal being described by its reduced-cell basis vectors and triclinic symmetry. Space-group assignment is postponed to the last program step, CORRECT, when integrated intensities are available. The following problems can be encountered in this step. (i) The indices of many difference-vector clusters deviate significantly from integral values. This can be caused by incorrect input parameters, such as rotation axis, oscillation angle or detector position, by a large fraction of alien spots in SPOT.XDS, by placing the detector too close to the crystal or by an inappropriate choice of the parameters SEPMIN= and CLUSTER_RADIUS= in densely populated images. (ii) Indexing and refinement is unsatisfactory despite well indexed difference-vector clusters. This is probably caused by the selection of an incorrect index origin and IDXREF should be rerun with plausible alternatives for INDEX_ORIGIN= after a visual check of a data image with XDS-Viewer. (iii) Despite successful indexing and refinement, IDXREF stops with the error message INSUFFICIENT PERCENTAGE OF INDEXED REFLECTIONS, complaining that less than 70% of the given spots could be explained. Alien spots often arise because of the presence of ice or small satellite crystals and continuation of data processing may still be meaningful. To continue data processing, just specify the missing processing steps in XDS.INP by JOB=DEFPIX XPLAN INTEGRATE CORRECT and call XDS again. 2.5. DEFPIX DEFPIX recognizes regions in the initial background table (file BKGINIT.cbf) that are obscured by intruding hardware and marks the shaded pixels as untrusted. In addition, pixels that are outside a user-defined resolution range (INCLUDE_RESOLUTION_RANGE=) are marked and eliminated from the trusted region. The marked background table that is thus obtained is saved in the file BKGPIX.cbf which is needed by the subsequent program steps. To recognize the obscured regions in the initial background, DEFPIX generates a control image (file ABS.cbf) that contains values of around 10 000 for unshaded pixels and lower values for shaded pixels. The classification of the pixels into reliable and untrusted pixels is based on the two input parameters VALUE_RANGE_FOR_TRUSTED_DETECTOR_PIXELS= (default 6000 30 000) and INCLUDE_RESOLUTION_RANGE= (default 20.0 0.0). Pixels in the table ABS.cbf with a value outside the ranges specified by the two parameters are marked unreliable (by −3) in the background table BKGPIX.cbf. The following problem can be encountered in this step. If the parameter VALUE_RANGE_FOR_TRUSTED_DETECTOR_PIXELS= specifies a value range that is too narrow, ‘good’ regions will erroneously be excluded from the trusted detector region. Check BKGPIX.cbf with the XDS-Viewer program and if necessary repeat the DEFPIX step with more appropriate values. 2.6. XPLAN XPLAN supports the planning of data collection. It is based upon information provided by the input files XPARM.XDS and BKGPIX.cbf, both of which become available on processing a few test images with XDS. XPLAN estimates the completeness of new reflection data expected to be collected for each given starting angle and total crystal rotation and reports the results for a number of selected resolution shells in the file XPLAN.LP. To minimize the recollection of data, the name of a file containing already measured reflections can be provided by the input parameter REFERENCE_DATA_SET=. The following problems can be encountered in this step. (i) Incorrect results may occur for some space groups, i.e. P42, if the unit cell determined by XDS from processing a few test images implicates reflection indices that are inconsistent with those from the reference data set. However, the correct cell choice can be found by using the old data as a reference and repeating CORRECT with the appropriate reindexing transformation, followed by copying GXPARM.XDS to XPARM.XDS. The same applies if IDXREF was run for an unknown space group and then reindexed in CORRECT. (ii) XPLAN ignores potential reflection overlap owing to the finite oscillation range covered by each image. 2.7. INTEGRATE INTEGRATE determines the intensity of each reflection predicted to occur in the rotation data images (DATA_RANGE=) and saves the results in the file INTEGRATE.HKL. The diffraction parameters needed to predict the reflection positions are initially provided by the file XPARM.XDS. These parameters are either kept constant or refined periodically using strong diffraction spots encountered in the data images. Whether refinement should be carried out at all and which parameters are to be refined can be specified by the user [input parameter REFINE(INTEGRATE)=]. The centroids of the strong spots in the data images are computed from pixels that exceed the background by a given multiple of standard deviations (input parameters SIGNAL_PIXEL=, BACKGROUND_PIXEL=). Strong spots are used in the refinement if their centroids are reasonably close to their calculated position (input parameter MAXIMUM_ERROR_OF_SPOT_POSITION=). For determination of the intensity, approximate values describing the extension and the form of the diffraction spot must be specified. The shapes of all spots become very similar when the contents of each of their contributing image pixels is mapped onto a three-dimensional coordinate system, specific for each reflection, which has its origin on the surface of the Ewald sphere at the terminus of the diffracted beam wavevector (see Kabsch, 2010 ▶). The transformed spot can roughly be described as a Gaussian involving two parameters: the standard deviations of the reflecting range σM (input parameter REFLECTING_RANGE_E.S.D.=σM) and the beam divergence σD (input parameter BEAM_DIVERGENCE_E.S.D.=σD). This leads to an integration region around the spot that is defined by the parameters δM (REFLECTING_RANGE=) and δD (BEAM_DIVERGENCE=), which are typically chosen to be 6–10 times larger than σM and σD, respectively. Appropriate values for these parameters are determined automatically by XDS (Kabsch, 2010 ▶); the user has the option to override the automatic assignments. Integration is carried out by a two-step procedure. In the first pass, spot templates are generated by superimposing the profiles of strong reflections after their mapping to the Ewald sphere. Grid points with a value above a minimum percentage of the maximum in the template (parameter CUT=) are marked for inclusion in the final integration. To allow for variations in their shape, profile templates are generated from reflections located at nine regions of equal size covering the detector surface and additional sets of nine to cover equally sized (parameter DELPHI=) batches of images. The actual integration is carried out in the second pass by profile fitting with respect to the spot shape determined in the first pass. Incomplete reflections below a minimum percentage of the observed reflection intensity (parameter MINPK=) will be discarded. Otherwise, the missing intensity is estimated from the learned reflection profiles. On return from the INTEGRATE step, all spots expected to occur in the last data image are encircled and the modified image is saved as the file FRAME.cbf for inspection. The following problems can be encountered in this step. (i) Off-centred profiles indicate incorrectly predicted reflection positions by using the parameters provided by the file XPARM.XDS (i.e. misindexing by using a wrong origin of the indices), crystal slippage or change in the incident-beam direction. (ii) Profiles extending to the borders of the box indicate too-small values of the parameters BEAM_DIVERGENCE= or REFLECTING_RANGE=. This leads to incorrect integrated intensities because of truncated reflection profiles and un­reliable background determination. (iii) Display of the file FRAME.cbf shows spots which are not encircled. If these unexpected reflections are not close to the spindle and are not ice reflections, then it is likely that the parameters provided by the file XPARM.XDS are wrong. 2.8. CORRECT CORRECT applies correction factors to the intensities and standard deviations of all reflections found in the file INTEGRATE.HKL, determines the space group if unknown and refines the unit-cell parameters, reports the quality and com­pleteness of the data set and saves the final integrated intensities in the file XDS_ASCII.HKL. Some of the employed algorithms are new and are described in Kabsch (2010 ▶). CORRECT accepts reflections from the file INTEGRATE.HKL that are (i) recorded (parameter MINPK=) on specified images (parameter DATA_RANGE=); (ii) within a given resolution range (parameter INCLUDE_RESOLUTION_RANGE=); (iii) outside ice rings (parameter EXCLUDE_RESOLUTION_RANGE=); (iv) not overloaded (parameter OVERLOAD=); and (v) not marked for exclusion in the file REMOVE.HKL. Thus, the user has the option to exclude unreliable reflections from the final data set by repeating the CORRECT step with appropriate parameter values. The intensities of the accepted reflections are first corrected for effects arising from polarization of the incident beam (parameters FRACTION_OF_POLARIZATION=, POLARIZATION_PLANE_NORMAL=) and absorption effects (parameters AIR=, SILICON=, SENSOR_THICKNESS=) arising from differences in path lengths of the diffracted beam. These corrections do not depend on knowledge of the space group. The integrated intensities of the reflections in the file INTEGRATE.HKL may or may not have been indexed in the correct space group; for the purpose of integration, it is important only that all reflections occurring in the data images have been indexed with respect to some unit-cell basis and that their locations on the images were hit exactly. The correct reflection indices in the true space group are always a linear transformation of the original indices used in INTEGRATE.HKL. All lattices consistent with the locations of the reflections saved in INTEGRATE.HKL (decision parameters MAX_CELL_AXIS_ERROR=, MAX_CELL_ANGLE_ERROR=) and their corresponding linear transformations are printed to provide a useful overview similar to that shown in IDXREF.LP. If the space group is not specified, XDS proposes one of the enantiomorphous space groups without screw axes that is compatible with the observed lattice symmetry and explains the intensities of a subset of the reflections (parameter TEST_RESOLUTION_RANGE=) at an acceptable R meas (Diederichs & Karplus, 1997 ▶; Weiss, 2001 ▶) using a minimum number of unique reflections. The criteria for an acceptable R meas are controlled by the decision parameters MIN_RFL_Rmeas= and MAX_FAC_Rmeas=. The user can always override the automatic decisions by specifying the correct space-group number (parameter SPACE_GROUP_NUMBER=) and unit-cell parameters (parameter UNIT_CELL_CONSTANTS=) in XDS.INP and repeating the CORRECT step. This provides a simple way to rename orthorhombic unit-cell parameters, which often becomes necessary if screw axes are present. In addition, the user has the option to specify the following in XDS.INP: (i) a reference data set (parameter REFERENCE_DATA_SET=), (ii) a reindexing transformation (parameter REIDX=) and (iii) three basis vectors if known from processing a previous data set taken at the same crystal orientation in a multi-wavelength experiment (parameters UNIT_CELL_A-AXIS=, UNIT_CELL_B-AXIS=, UNIT_CELL_C-AXIS=). The possibility of comparing the new data with a reference data set is particularly useful for resolving the issue of alternative settings of polar or rhombohedral cells (such as P4, P6 and R3). Also, reference data are quite useful for recognizing misindexing or for testing potential heavy-atom derivatives. For refinement of the unit-cell parameters [parameter REFINE(CORRECT)=], CORRECT uses a subset of the accepted reflections whose observed centroid is sufficiently close to the predicted spot position (parameter MAXIMUM_ERROR_OF_SPOT_POSITION=). The refined set of parameters is saved in the file GXPARM.XDS, which has an identical layout to the file XPARM.XDS produced by IDXREF. If the crystal has not slipped during data collection, these parameters are quite accurate. Other correction factors (parameter CORRECTIONS=) which partially compensate for radiation damage, absorption effects and variations in the sensitivity of the detector surface are determined from the symmetry-equivalent reflections usually found in the data images. The corrections are chosen such that the integrated intensities of symmetry-equivalent reflections come out as similar as possible. The user may control application of the various corrections by specifying the parameter CORRECTIONS= by a combination of the key­words DECAY MODULATION ABSORPTION. Whether Friedel pairs are considered as symmetry-equivalent reflections in the calculation of the correction factors depends on the values of the two parameters STRICT_ABSORPTION_CORRECTION= and FRIEDEL’S_LAW=. The number of correction factors is controlled by the input parameters MINIMUM_I/SIGMA=, NBATCH= and REFLECTIONS/CORRECTION_FACTOR=. The residual scatter in intensity of symmetry-equivalent reflections is used to estimate their standard deviations. Here, the initial estimate v 0(I) (obtained from the INTEGRATE step) for the variance of the reflection intensity I is replaced by v(I) = a[v 0(I) + bI 2]. The two constants a and b are chosen to minimize discrepancies between v(I) and the variance estimated from sample statistics of symmetry-related reflections. Based on the more realistic error estimates for the intensities, outliers are recognized by comparison with other symmetry-equivalent reflections. These outliers are included in the main output file XDS_ASCII.HKL, in which they are marked by a negative sign attached to the estimated standard deviations of their intensity. Classification of a reflection as a misfit is con­trolled by a decision constant which has the default value of WFAC1=1.5. Specification of a lower value such as WFAC1=1.0 by the user will lead to an increasing number of misfits and lower R factors as outliers are not included in the reported statistics. Data quality as a function of resolution is described by the agreement of intensities of symmetry-related reflections and quantified by the R factors R merge and the more robust indicator R meas (Diederichs & Karplus, 1997 ▶; Weiss, 2001 ▶). These R factors as well as the intensities of all reflections with indices of type h00, 0k0 and 00l and those expected to be systematically absent provide important information for identification of the correct space group. Clearly, large R factors or many rejected reflections or large observed intensities for reflections that are expected to be systematically absent suggest that the assumed space group or indexing is incorrect. The presence or absence of anomalous scatterers is specified by the parameter FRIEDEL’S_LAW=. Finally, CORRECT analyzes the distribution of reflection intensities as a function of their resolution and reports outliers from the Wilson plot. Often, these aliens arise from ice rings in the data images. To suppress the un­wanted reflections from the final output file XDS_ASCII.HKL, the user copies them to a file named REMOVE.HKL in the current directory and repeats the CORRECT step. The following problems can be encountered in this step. (i) Incomplete data sets may lead to wrong conclusions about the space group, as some of its symmetry operators might not be involved in the R-factor calculations. (ii) Often, the CORRECT step is repeated several times. It should be remembered that XDS overwrites earlier versions of the output files XDS_ASCII.HKL, GXPARM.XDS etc. 3. XSCALE The scaling program XSCALE (i) puts one or more files obtained from data processing with XDS on a common scale and reports the completeness and quality of the data sets; (ii) offers a choice of either combining symmetry-equivalent observations into a single unique reflection or saving the scaled but unmerged observations in the output file; (iii) allows several output files that are placed on the same scale, a feature that is recommended for MAD data sets taken from the same crystal at different wavelengths; (iv) determines correction factors that partially compensate for absorption effects, sensitivity variations in the detector plane and radiation damage; and (v) can correct reflections individually for radiation damage (Diederichs et al., 2003 ▶). The program uses a new fast algorithm (Kabsch, 2010 ▶) and imposes no limitations on the number of data sets or scaling/correction factors. The easiest way to run XSCALE is to copy a template input file named XSCALE.INP to a new directory and to replace the parameter values by the appropriate values describing the actual scaling run. The input parameters may be given in arbitrary order, except for the parameters defining the input and output reflection files (INPUT_FILE=, OUTPUT_FILE=). Here, an output file is defined first by the parameter OUTPUT_FILE= that will include the scaled and merged reflections from all following input files specified by the parameters INPUT_FILE= until the next occurrence of OUTPUT_FILE= in XSCALE.INP. An arbitrary number of output files can be specified (together with their set of input files) in a single run of XSCALE. All output files are then on the same scale, which is a useful program feature for MAD data sets. The reflections in each output file will be unmerged and Friedel pairs will be considered to be different if this holds for all of the input data sets unless explicitly redefined by the parameters MERGE= and FRIEDEL’S_LAW=. Moreover, each output file accepts an additional parameter that controls how the Friedel pairs of the input files are treated in the calculation of the absorption correction factors. If STRICT_ABSORPTION_CORRECTION=FALSE, Friedel pairs are treated as symmetry-equivalent reflections in these calculations, which could lead to an underestimate of the anomalous differences in the presence of anomalous scatterers. Friedel pairs are only treated as different reflections in the calculations if STRICT_ABSORPTION_CORRECTION=TRUE and FRIEDEL’S_LAW=FALSE. For each input file, a resolution window for accepting reflections (INCLUDE_RESOLUTION_RANGE=), the extent of absorption corrections (CORRECTIONS=DECAY MODULATION ABSORPTION) and the number of correction factors (NBATCH=) can be specified. Finally, each input data set can be corrected for radiation damage by specifying the name of the crystal the data set was obtained from (CRYSTAL_NAME=). Specification of this parameter implicates zero-dose extrapolation of individual reflection intensities to compensate for the effects of radiation damage experienced by the crystal so far (see Diederichs et al., 2003 ▶). Each resulting scaled data set is of XDS_ASCII format. It can be converted into a CCP4-style multi-record MTZ file using the copy feature of the program POINTLESS (Evans, 2006 ▶) available from the web (ftp://ftp.ccp4.ac.uk/ccp4/6.0.2/prerelease/pointless.html) or converted by XDSCONV into the format required by various structure-solution packages. 4. XDSCONV XDSCONV accepts reflection-intensity data files as produced by XSCALE or CORRECT and converts them into the format required by software packages for structure determination. XDSCONV estimates structure-factor moduli based on the assumption that the intensity data set obeys Wilson’s distribution and uses a Bayesian approach to statistical inference as described by French & Wilson (1978 ▶). The output file generated may inherit the test reflections previously used to calculate a free R factor (Brünger, 1992 ▶) or may contain new test reflections selected by XDSCONV. 5. Parallelization of XDS In order to efficiently use modern multiprocessor hardware, a major effort has been undertaken to replace the original code of XDS by routines that can run concurrently with very little need for synchronization. As described above, data processing by XDS is organized into eight steps that must be executed in a fixed order since the result of each step is needed as input for the subsequent ones. Thus, the only way to speed up processing is to make each step faster. The most computationally intensive steps are COLSPOT and INTEGRATE and, to a lesser degree, the routine that refines diffraction parameters in IDXREF and CORRECT. Thus, the highest savings in wall clock time are expected to result from changing these routines so that each one can make efficient use of the multiprocessor hardware. Two methods can be used (simultaneously) to speed up data processing. In the first method, XDS divides the set of data images into approximately equal portions, calls a shell script that starts an independent job for processing each portion of images by the computer cluster and waits until all jobs have finished. The number of such independent jobs can be limited by the user (MAXIMUM_NUMBER_OF_JOBS=); up to 99 jobs are allowed. This method works even if the processors do not share the same address space since the jobs are independent processes that do not communicate at all. The second method uses OpenMP to control execution by a team of threads and relies on a shared-memory multiprocessor platform. This allows the program to exploit data parallelism at a more fine-grained level to speed up refinements and routines for setting up and solving systems of linear equations. The maximum number of threads that can be employed by the parallel version of XDS (xds_par, xscale_par) can be limited by the user (MAXIMUM_NUMBER_OF_PROCESSORS=); up to 32 processors can be used. OpenMP has been chosen for execution control because it hardly adds to the complexity of the program code and most importantly does not require the maintenance of separate versions of the source code depending on whether the program is intended for execution by a team of processors or just by a single CPU. Moreover, OpenMP has become the de facto standard and compilers accepting OpenMP directives are available for most shared-memory multiprocessor platforms. The new version of COLSPOT comprises an initial part, a concurrent procedure and a final part. After initialization each available processor is kept busy analyzing its share of rotation images for strong pixels, which are saved in a processor-specific file. In the final sequential part of COLSPOT all files resulting from the concurrent computations are read and the location addresses, image running numbers and signal values of the strong pixels are stored in a hash table. Strong pixels belonging to the same spot can be located rapidly in this table and the centroids of the spots are saved in the final output file from this step. For the INTEGRATE step, the rotation images are divided into approximately equal portions for independent processing under control by a shell script according to the first method described above. When all jobs have finished, the integrated intensities from each independent file are joined. Minor problems could occur for reflections that receive intensity contributions from images that have been processed by different jobs. Compared with processing as a single job, the observed intensity differences are small and disappear if the different jobs use identical reference profiles and diffraction parameters to predict spot locations [to avoid refinements, specify REFINE(INTEGRATE)=!]. In addition, each of the independent jobs can be executed by a team of processors controlled by OpenMP. The rotation images analyzed by each job are split into a sequence of batches of consecutive images that cover a total rotation range that is large enough to accommodate the integration domain. The batches are evaluated in strictly sequental order; parallel processing is confined to images within each batch. The restructured routine for the INTEGRATE step consists of code regions for parallel execution interspersed by sequential sections. After initialization, strong reflections and their mean size and extent are determined concurrently. The diffraction parameters are refined in parallel processing mode based on the observed spot locations. In the following sequential section a database is generated containing information about all reflections occurring in this batch of images. A subset of strong reflections is also identified that is useful for the subsequent reflection-profile learning pass. The mean profile of these reflections is determined concurrently in a second pass through the images in the batch. Reflection integration by profile fitting is carried out in parallel in the third cycle through the batch. In the final sequential step the results from each job, which have been saved in files, are harvested and intensity contributions to the same reflection from adjacent batches are merged. 6. Availability Documentation and executable versions of the XDS package for widely used computer systems running under Linux or OSX can be obtained from the XDS homepage (http://xds.mpimf-heidelberg.mpg.de/) free of charge for use by academics for noncommercial applications. Additional information can be found at http://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/XDS. For looking at rotation data images and control images generated by XDS, an open-source program XDS-Viewer written by Michael Hoffer can be obtained from http://xds-viewer.sourceforge.net under the GNU General Public License. A graphical interface XDSi (Kursula, 2004 ▶) is available (http://cc.oulu.fi/~pkursula/xdsi.html) that simplifies the operation of XDS.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Overview of the CCP4 suite and current developments

              1. Introduction CCP4 (Collaborative Computational Project, Number 4, 1994 ▶) exists to produce and support a world-leading integrated suite of programs that allows researchers to determine macromolecular structures by X-ray crystallography and other bio­physical techniques. CCP4 aims to develop and support the development of cutting-edge approaches to the experimental determination and analysis of protein structure and to integrate these approaches into the CCP4 software suite. CCP4 is a community-based resource that supports the widest possible researcher community, embracing academic, not-for-profit and for-profit research. CCP4 aims to play a key role in the education and training of scientists in experimental structural biology. It encourages the wide dissemination of new ideas, techniques and practice. In this article, we give an overview of the CCP4 project, past, present and future. We begin with a historical perspective on the growth of the software suite, followed by a summary of the current functionality in the suite. We then discuss ongoing plans for the next generation of the suite which is in development. In this account we focus on the suite as a whole, while other articles in this issue delve deeper into individual programs. We intend that this article could serve as a general literature citation for the use of the CCP4 software suite in structure determination, although we also encourage the citation of individual programs, many of the relevant references for which are included here. While we focus here on the CCP4 software suite, we would emphasize that comparable functionality is available in other software packages such as SHARP/autoSHARP (Vonrhein et al., 2007 ▶), SHELX (Sheldrick, 2008 ▶), ARP/wARP (Langer et al., 2008 ▶), PHENIX (Adams et al., 2010 ▶) and many others. 2. Evolution of the CCP4 software suite The CCP4 software suite is a collection of programs implementing specific algorithms concerned with macromolecular structure solution from X-ray diffraction data. Significantly, it is a collection of autonomous and independently developed programs. While some have been commissioned by the academic committees overseeing the CCP4 project, the majority originate from the community to address a perceived gap in current functionality or to implement newly developed algorithms. The result is a collection of around 200 programs, ranging from large programs which are effectively packages in themselves to small ‘jiffy’ programs. Over the years the suite has grown continuously, with each major release featuring significant new software (see Table 1 ▶). Unsurprisingly, there is overlap of functionality, with several programs performing a particular task, albeit often using different approaches. The question then is how to combine these programs into a software suite, both in terms of ensuring communication between the different programs and in helping both naïve and experienced users to navigate through the suite. Early on in the history of CCP4, there was an agreement for all programs to use the same file formats for data files. Formats were specified for diffraction data (the LCF format, later replaced by the MTZ format) and for electron-density maps (the CCP4 map format), while for atomic coordinates the PDB format was adopted. A software library was developed to facilitate reading and writing of these data formats and thereby ensure standardization of the formats. Originally supporting only Fortran programs, the library was re-written to support both Fortran and C/C++ as well as scripting languages (Winn et al., 2002 ▶). The CCP4 set of libraries has since expanded to cover a wider range of crystallographic tasks, in particular with the addition of the Clipper library (Cowtan, 2003 ▶), the MMDB library (Krissinel et al., 2004 ▶) and the CCTBX library (Grosse-Kunstleve et al., 2002 ▶) from the PHENIX project (Adams et al., 2010 ▶). Crystallographic tasks were performed by writing or adapting scripts (e.g. Unix shell or VMS scripts) to link together a number of programs (Fig. 1a ▶) and the suite can still be run in this way. The programs communicate solely via the data files which are passed between them. The user sets program options based on the program documentation and the expected results from earlier steps. A major change was introduced in 2000 with the release of the graphical user interface ccp4i (Fig. 1b ▶; Potterton et al., 2003 ▶). Task interfaces help the user to prepare run scripts. Details of how to run specific programs are largely hidden, as are the jiffy programs used to perform minor functions such as format conversion. Some limited intelligence in the interface code allows program options to be customized according to properties of the data and/or the desired objective. ccp4i interfaces are now available for all of the commonly used CCP4 programs as well as for several non-CCP4 programs (e.g. ARP/wARP; Langer et al., 2008 ▶). The ccp4i interface also introduced for the first time tools for helping the user to organize data. Jobs that have been run were recorded in a ‘database’ (in reality a directory of files) with tools to access and interpret the files saved there. Jobs are further organized into projects, representing different structure solutions. There are now plans to update the CCP4 GUI (see §4), but the impact of the original ccp4i on the suite should not be underestimated. In the last few years, two other modes of accessing the CCP4 suite have emerged. On the one hand, the latest version of the suite contains four complementary automation pipelines, namely xia2 (Winter, 2010 ▶), CRANK (Ness et al., 2004 ▶), MrBUMP (Keegan & Winn, 2007 ▶) and BALBES (Long et al., 2008 ▶). These pipelines attempt to perform large sections of the full structure solution (e.g. phasing) without user intervention. This is achieved partly through the use of a large number of trials, trying different protocols and performing parameter scanning. Such an approach can be very powerful, using cheap computer power to make many more attempts than a user would manually. Automation pipelines have been realised in the last few years because of the maturity of the underlying programs and the availability of sufficient computer power to support multiple trials. On the other hand, graphical programs for interactive use have become more powerful. Rather than simply reviewing the results of previously run programs and performing interactive model editing, Coot (Emsley et al., 2010 ▶) can launch separate refinement and validation programs (Fig. 1c ▶). Similarly, iMOSFLM can be used to interface the data-processing programs POINTLESS and SCALA. In some ways this is a completely different scenario to the automation pipelines. User interaction is paramount, with crystallo­graphy programs acting as tools to be invoked. The user can become familiar with the data and structure and use this to make intelligent decisions. Such an approach has also become possible because of the maturity of the invoked programs and the availability of sufficient computer power to run the programs interactively. 3. Overview of current functionality In this section, we give an overview of the current functionality of the CCP4 software suite (corresponding to release series 6.1 at the time of writing). We summarize the automation pipelines and individual programs included in the suite; many more details can be found in the accompanying articles in this issue. We present the functionality in the traditional manner, starting at data processing and ending at validation. However, it is becoming increasingly apparent that these neat categories are breaking down. 3.1. Data processing The earliest starting point for entry into the CCP4 suite is a set of X-ray diffraction images. The data-reduction program MOSFLM (Leslie, 2006 ▶) will take a set of diffraction images, identify spots on each image, index the diffraction pattern and thus identify the Bragg peaks, and integrate the spots. The output is a list of integrated intensities and their standard uncertainties labelled by the h, k, l indices. Associated information includes the batch number of the image from which the intensity was obtained, whether the peak was full or partial and the symmetry operation that relates the particular observation to the chosen asymmetric unit. MOSFLM continues to be improved, with support added recently for Pilatus detectors, addition of automatic backstop masking etc. The most visible change is the replacement of the old X-­windows-based interface with the Tcl-based iMOSFLM interface (Fig. 2 ▶), which guides the user in a stepwise manner through the stages of data processing. POINTLESS is a relatively new program whose primary purpose is to identify the Laue group of a crystal from an unmerged data set (Evans, 2006 ▶). The program will also attempt to identify the space group from an analysis of systematic absences. A secondary purpose is to test the choice of indexing and re-index a data set if necessary. Given a choice of space group, the program SCALA (Evans, 2006 ▶) will refine the parameters of a scaling function for an unmerged data set, apply scales to each observation of a reflection and merge all observations of a reflection to give an average intensity. It will also provide an improved estimate of the standard uncertainty of each intensity. The new program CTRUNCATE (which replaces the older TRUNCATE; Stein, unpublished program) can then convert the intensities to structure-factor amplitudes, although downstream programs increasingly use the mean intensities directly. Perhaps more importantly, CTRUNCATE will analyse a data set for signs of twinning, translational noncrystallographic symmetry (NCS), anisotropy and other notable features, since it is best to identify problems before attempting phasing. The program SFCHECK (Vaguine et al., 1999 ▶) will also provide an analysis of a data set, including testing for twinning and translational NCS, estimating the optical resolution and the anisotropy, and plotting the radial and angular completeness. The previous steps of data processing are automated by the xia2 pipeline (Winter, 2010 ▶). From a directory of images, xia2 will identify the type of experiment (multi-wedge, multi-pass, multi-wavelength) and process accordingly. The pipeline will determine the point group, space group and correct indexing. Multiple processing pipelines using alternative underlying programs are supported. At the end, the user should have a set of merged structure-factor amplitudes suitable for input to phasing. 3.2. Experimental phasing CCP4 includes the CRANK pipeline (Ness et al., 2004 ▶), which covers experimental phasing and beyond, and interfaces with several CCP4 and non-CCP4 programs. Heavy-atom sub­structure detection is performed by AFRO/CRUNCH2 (de Graaff et al., 2001 ▶) or by SHELXC/D (Sheldrick, 2008 ▶) and initial phasing is carried out by BP3 (Pannu et al., 2003 ▶; Pannu & Read, 2004 ▶) or SHELXE (Sheldrick, 2008 ▶). Phase improvement is carried out by SOLOMON (Abrahams & Leslie, 1996 ▶), DM (Cowtan et al., 2001 ▶) or Pirate (Cowtan, 2000 ▶) and automated model building by Buccaneer (Cowtan, 2006 ▶; Cowtan, 2008 ▶) or ARP/wARP (Langer et al., 2008 ▶). CRANK thus supports a range of underlying software handling the communication of data and allowing the user to trial different combinations. CCP4 includes a number of additional individual programs, each of which has its own particular strength. The long-standing CCP4 program MLPHARE for phasing still works in straight­forward cases and is fast to use. ACORN (Jia-xing et al., 2005 ▶; Dodson & Woolfson, 2009 ▶) uses ab initio methods for the determination of phases starting from a small fragment which could be a single heavy atom. The use of ab initio methods usually requires atomic resolution data, since it assumes atomicity of the electron density. However, a variant of the so-called free-­lunch algorithm (Jia-xing et al., 2005 ▶) allows the temporary generation of phases to atomic resolution which the ACORN method can utilize. The OASIS program (Wu et al., 2009 ▶) also uses ab initio methods to break the phase ambiguity in SAD/SIR phasing. Phaser (McCoy et al., 2007 ▶) can obtain phase estimates starting from known heavy-atom positions and SAD data. Log-likelihood gradient (LLG) maps are used to automatically find additional sites for anomalous scatterers and to detect anisotropy in existing anomalous scatterers. Phaser can also use a partial model, for example from a molecular-replacement solution that is hard to refine, as a source of phase information to help locate weak anomalous scatterers and thus improved phases. The latter reflects the view of experimental phasing and molecular replacement as just two sources of phase information rather than two separate techniques. 3.3. Molecular replacement CCP4 includes two pipelines for molecular replacement (MR): MrBUMP (Keegan & Winn, 2007 ▶) and BALBES (Long et al., 2008 ▶). Both start from processed data and a target sequence and aim to deliver a molecular-replacement solution consisting of positioned and partially refined models. BALBES uses its own database of protein molecules and domains taken from the PDB and customized for MR, while MrBUMP uses public databases and a set of widely available bioinformatics tools to generate possible search models. BALBES is based around the MR program MOLREP (Vagin & Teplyakov, 1997 ▶, 2010 ▶), while MrBUMP can also use the program Phaser (McCoy et al., 2007 ▶). Both MOLREP and Phaser are also available as stand-alone programs in CCP4. As well as providing rotation and translation functions, whereby a search model is positioned in the unit cell to give an initial estimate of the phases, these programs provide additional functionality, including a significant contribution to automated decision-making. For instance, a single run of Phaser can search for several copies each of several components in the structure of a complex, testing different possible search orders and trying different possible choices of space group. The search model for MR may be an ensemble of structures, a set of models from an NMR structure or an electron-density map. Phases for the target may be available, so that the search model is to be fitted into electron density, or there may be density available from an electron-microscopy experiment. The MR step can be followed by rigid-body refinement and the packing of the MR solution can be checked. Much of this functionality is common to Phaser and MOLREP, but there are a number of differences in implementation, so that both may prove useful in certain circumstances. A crucial component of MR is the selection and preparation of search models. The program CHAINSAW (Stein, 2008 ▶) takes as input a sequence alignment which relates residues in the search model to residues in the target protein and uses this information to edit the search model appropriately. The output model is labelled according to the target sequence. MOLREP (Lebedev et al., 2008 ▶) can take as input the target sequence and performs its own alignment to the search model in order to edit the search model. 3.4. Phase improvement and automated model building Having obtained initial phases from experimental phasing, the next step is phase improvement (density modification) to give a map that can be built into. When phases come from molecular replacement, phase improvement may also be useful to reduce model bias. For a long time, the main CCP4 phase-improvement programs were DM (Cowtan et al., 2001 ▶) and SOLOMON (Abrahams & Leslie, 1996 ▶), which covered the standard techniques of solvent flattening/flipping, histogram matching and NCS averaging. More recently, statistically based methods have been incorporated into the program Pirate (Cowtan, 2000 ▶). Pirate can give better results, but has been found to be inconveniently slow. The latest program Parrot (Cowtan, 2010 ▶) achieves similar improvements but is also fast and automated. Given an electron-density map, automated model building is provided in CCP4 by Buccaneer (Cowtan, 2006 ▶, 2008 ▶). This finds candidate Cα positions, builds these into chain fragments, joins the fragments together and docks a sequence. NCS can be used to rebuild and complete related chains. Since version 1.4, there is support for model (re)building after molecular replacement and for supplying known structural elements such as heavy atoms. The CCP4 suite includes an interface for alternating cycles of model building with Buccaneer with cycles of model refinement with REFMAC5. The supplementary program Sloop (Cowtan, unpublished program) builds missing loops using fragments taken from the Richardson’s Top500 library of structures (Lovell et al., 2003 ▶) to fill gaps in the chain. The chance of finding a good fit falls with increasing size of the gap, but the method may work for loops of up to eight residues in length. RAPPER (Furnham et al., 2006 ▶) provides a conformational search algorithm for protein modelling, which can produce an ensemble of models satisfying a wide variety of restraint information. In the context of CCP4, restraints on the modelling are provided by the electron density and/or the locations of the Cα atoms. The ccp4i interface includes modes for loop building or for building the entire structure. 3.5. Refinement and model completion The aim of macromolecular crystallography is to produce a model of the macromolecule of interest which explains the diffraction images as accurately and completely as possible. Both the form of the model and the parameters of the model need to be defined. Refinement is the process of optimizing the values of the model parameters and in CCP4 is performed by the program REFMAC5 (Murshudov et al., 1997 ▶). REFMAC5 will refine atomic coordinates and atomic isotropic or anisotropic displacement parameters (Murshudov et al., 1999 ▶), as well as group parameters for rigid-body refinement and TLS refinement (Winn et al., 2001 ▶, 2003 ▶). It will also refine scaling parameters and a mask-based bulk-solvent correction. When good-quality experimental phases are available, these can be included as additional data (Pannu et al., 1998 ▶). More recently, it has become possible to refine directly against anomalous data for the cases of SAD (Skubák et al., 2004 ▶) and SIRAS (Skubák et al., 2009 ▶) without the need for estimated phases and phase probabilities. REFMAC5 will also now refine against twinned data (Lebedev et al., 2006 ▶), automatically recognising the twin laws and estimating the corresponding twin fractions. The nonprotein contents of the crystal are often of most interest, such as bound ligands, cofactors, metal sites etc. Correct refinement at moderate or low resolution requires a knowledge of the ideal geometry together with associated uncertainties. In REFMAC5 this is handled through a dictionary of possible ligands (Vagin et al., 2004 ▶), with details held in mmCIF format. Dictionary files can be created through the tools SKETCHER and JLIGAND. Refinement goes hand-in-hand with rounds of model building which add/subtract parts of the model and apply large structural changes that are beyond the reach of refinement. In addition to the automated procedures of Buccaneer and RAPPER described above, there are many model-building tools in Coot (Emsley et al., 2010 ▶). A ccp4i interface to the popular ARP/wARP model-building package (Langer et al., 2008 ▶) has also been available for many years. 3.6. Validation, deposition and publication Validation is the process of ensuring that all aspects of the model are supported by the diffraction data, as well as con­forming with known features of protein chemistry. Although validation has traditionally been viewed as something that is performed at the end of structure determination, just before deposition, it is now appreciated that validation is an integral part of the process of structure solution, which should be carried out continually. CCP4 includes a wide variety of validation tools, all of which should be run to gain a complete picture of model quality. Coot (Emsley et al., 2010 ▶) has a dedicated drop-down menu of validation tools which can and should be applied as the model is being built. Coot can also extract warnings about particular links or outliers from a REFMAC5 log file. Warnings associated with specific atoms or residues are linked directly to the model as viewed in Coot. The ccp4i ‘Validation and Deposition’ module contains further validation tools. As mentioned above, SFCHECK (Vaguine et al., 1999 ▶) provides a number of measures of data quality, but if a model is provided it will also assess the agreement of the model with the data. Sequins (Cowtan, unpublished program) validates the assigned sequence against electron density (generated from experimental phases or from phases calculated from a side-chain omit process) and warns of mis­placed side chains or register errors. RAMPAGE (which is part of the RAPPER package; Furnham et al., 2006 ▶) provides Ramachandran plots based on updated ϕ–ψ propensities. PROCHECK is also included, although the Ramachandran plots are no longer generated, having been superseded by RAMPAGE. R500 (Henrick, unpublished program) checks the stereochemistry in a given PDB file against expected values and lists outliers in REMARK 500 records. The quaternary structure of the protein can be analysed with PISA (Krissinel & Henrick, 2007 ▶). This considers all possible interfaces in the crystal structure, estimates the free energy of dissociation, taking into account solvation and entropy effects, and predicts which interfaces are likely to be of biological significance. The CCP4 molecular-graphics program CCP4mg (Potterton et al., 2002 ▶, 2004 ▶) provides a simple means of generating publication-quality images and movies. As well as displaying coordinates in a wide variety of styles, CCP4mg can display molecular surfaces, electron density, arbitrary vectors and labels. The latest versions are built on the Qt toolkit, giving an enhanced look and feel (Fig. 3 ▶). Structures and views can be transferred between CCP4mg and Coot. 3.7. Jiffies and utilities In addition to the main functionality described above, the CCP4 suite contains a large number of utilities for performing format conversions and various analyses. Reflection data processed in other software packages can be imported with the utilities COMBAT, POINTLESS, SCALEPACK2MTZ, DTREK2SCALA and DTREK2MTZ, while data can be exchanged with other structure-solution packages with CONVERT2MTZ, F2MTZ, CIF2MTZ, MTZ2VARIOUS and MTZ2CIF. There are several useful utilities based on the Clipper library (Cowtan, 2003 ▶), such as CPHASEMATCH, which will compare two phase sets and look for changes in origin or hand. There are also many useful utilities for analysing coordinate files. New programs based on the MMDB library (Krissinel et al., 2004 ▶) include NCONT for listing atom contacts and PDB_MERGE for combining two PDB files. 4. Future plans At the heart of the CCP4 suite are the set of algorithms encoded in individual programs. As always, we include new programs in each major release of the suite and will continue to do so. Since the source of novel software is usually independent developers, the additions to the suite are not centrally planned. Nevertheless, some current themes are clearly recognisable, such as automated model building, in particular for low-resolution data. CCP4 also aims to enhance its functionality related to the maintenance and use of data on small molecules (ligands). Firstly, a considerably larger library of chemical compounds will be provided with the suite. Extended search functions will be provided to allow the efficient retrieval of known com­pounds or their close analogues. Secondly, existing functions for generating restraint data for new ligands will be enhanced by the inclusion of relevant software such as PRODRG (Schüttelkopf & van Aalten, 2004 ▶) into the suite, as well as by the development of new methods for structure reconstruction on the basis of partial similarity to structures in the library. Functionality will be available through a graphical front-end application, JLIGAND. In addition to the core programs, the infrastructure of CCP4 continues to evolve to support the latest working practices. The current CCP4 GUI, ccp4i, was a major innovation and has served us well for over ten years (Potterton et al., 2003 ▶). While it continues to provide a useful interface to the CCP4 suite, there are increasing demands from automation pipelines and users alike. In particular, there is a requirement to provide help on what to try next, advice which can be useful to both scientists and automated software. This depends on a robust assessment of the experimental data and the results of previous processing, which in turn requires good data management. We aim to address these issues through the development of a next-generation CCP4 interface. There will also be changes in the way that CCP4 is delivered to the end user. We have all become used to automated up­dates to the software we use (e.g. Windows Update, Synaptic for Debian-based Linux or application-specific updates such as for Firefox). Some CCP4 programs do alert the users to the availability of newer versions and CCP4mg (Potterton et al., 2002 ▶, 2004 ▶) will update the version on request. A CCP4-wide update mechanism is more difficult given the heterogeneous nature of the suite, but efforts in this direction are under way. A specific example of a remotely maintained crystallography platform is given by the US-based SBGrid Consortium. The CCP4 suite is downloaded to a user’s machine or a local server before being run. This is in contrast to many biology software tools, which are web-based. Reasons for running CCP4 locally include the wallclock time of jobs, the detailed control required and the size of data files. Nevertheless, there is increasing usage of web servers for crystallographic tasks. A server at York (http://www.ysbl.york.ac.uk/YSBLPrograms/index.jsp) runs a number of CCP4 programs, including BALBES and Buccaneer, while CCP4 programs are included in a number of other services, for example the ARP/wARP server at Hamburg (http://cluster.embl-hamburg.de/ARPwARP/remote-http.html). Plans are under way to make more CCP4 functionality available via the web. Finally, the coming years will see increasing integration of crystallography with other techniques, both experimental and theoretical. CCP4 aims to contribute towards efforts, such as the European infrastructure project INSTRUCT, to ease the transfer of data to and from these other domains.
                Bookmark

                Author and article information

                Journal
                Viruses
                Viruses
                viruses
                Viruses
                MDPI
                1999-4915
                09 November 2017
                November 2017
                : 9
                : 11
                Affiliations
                [1 ]Equipe Rétrovirus et Biochimie Structurale, Université de Lyon, CNRS, MMSB, UMR 5086 CNRS/Université de Lyon, IBCP, Lyon 69367 CEDEX 07, France; folio.christelle@ 123456gmail.com (C.F.); marie.dujardin@ 123456ibcp.fr (M.D.)
                [2 ]Laboratorio de Moléculas Bioactivas, Centro Universitario Regional Litoral Norte, Universidad de la República, Paysandú 60000, Uruguay; nataliasierraben@ 123456gmail.com (N.S.); guzmanalvarezlqo@ 123456gmail.com (G.A.)
                Author notes
                [* ]Correspondence: christophe.guillon@ 123456ibcp.fr ; Tel.: +33-(0)4-3765-2904
                Article
                viruses-09-00335
                10.3390/v9110335
                5707542
                29120364
                © 2017 by the authors.

                Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

                Categories
                Article

                Microbiology & Virology

                crystal structure, feline immunodeficiency virus, fiv, capsid protein

                Comments

                Comment on this article