3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Structural Basis for Design of New Purine-Based Inhibitors Targeting the Hydrophobic Binding Pocket of Hsp90

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Inhibition of the molecular chaperone heat shock protein 90 (Hsp90) represents a promising approach for cancer treatment. BIIB021 is a highly potent Hsp90 inhibitor with remarkable anticancer activity; however, its clinical application is limited by lack of potency and response. In this study, we aimed to investigate the impact of replacing the hydrophobic moiety of BIIB021, 4-methoxy-3,5-dimethylpyridine, with various five-membered ring structures on the binding to Hsp90. A focused array of N 7/ N 9-substituted purines, featuring aromatic and non-aromatic rings, was designed, considering the size of hydrophobic pocket B in Hsp90 to obtain insights into their binding modes within the ATP binding site of Hsp90 in terms of π–π stacking interactions in pocket B as well as outer α-helix 4 configurations. The target molecules were synthesized and evaluated for their Hsp90α inhibitory activity in cell-free assays. Among the tested compounds, the isoxazole derivatives 6b and 6c, and the sole six-membered derivative 14 showed favorable Hsp90α inhibitory activity, with IC 50 values of 1.76 µM, 0.203 µM, and 1.00 µM, respectively. Furthermore, compound 14 elicited promising anticancer activity against MCF-7, SK-BR-3, and HCT116 cell lines. The X-ray structures of compounds 4b, 6b, 6c, 8, and 14 bound to the N-terminal domain of Hsp90 were determined in order to understand the obtained results and to acquire additional structural insights, which might enable further optimization of BIIB021.

          Related collections

          Most cited references52

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          PHENIX: a comprehensive Python-based system for macromolecular structure solution

          1. Foundations 1.1. PHENIX architecture The PHENIX (Adams et al., 2002 ▶) architecture is designed from the ground up as a hybrid system of tightly integrated interpreted (‘scripted’) and compiled software modules. A mix of scripted and compiled components is invariably found in all major successful crystallographic packages, but often the scripting is added as an afterthought in an ad hoc fashion using tools that predate the object-oriented programming era. While such ad hoc systems are quickly established, they tend to become a severe maintenance burden as they grow. In addition, users are often forced into many time-consuming routine tasks such as manually converting file formats. In PHENIX, the scripting layer is the heart of the system. With only a few exceptions, all major functionality is implemented as modules that are exclusively accessed via the scripting interfaces. The object-oriented Python scripting language (Lutz & Ascher, 1999 ▶) is used for this purpose. In about two decades, a large developer/user community has produced millions of lines of highly uniform, interoperable, mature and openly available sources covering all aspects of programming ranging from simple file handling to highly sophisticated network communication and fully featured cross-platform graphical interfaces. Embedding crystallographic methods into this environment enables an unprecedented degree of automation, stability and portability. By design, the object-oriented programming model fosters shared collaborative development by multiple groups. It is routine practice to hierarchically recombine modules written by different groups into ever more complex procedures that appear uniform from the outside. A more detailed overview of the key software technology leading to all these advances, presented in the context of crystallography, can be found in Grosse-Kunstleve et al. (2002 ▶). In addition to the advantages outlined in the previous paragraph, the scripting language is generally most efficient for the rapid development of new algorithms. However, run­time performance considerations often dictate that numerically intensive calculations are eventually implemented in a compiled language. The first choice of a compiled language is of course to reuse the same language environment as used for the scripting language itself, which is a C/C++ environment. Not only is this the mainstream software environment on all major platforms used today, but with probably hundreds of millions of lines of C/C++ sources in existence it is an environment that is virtually guaranteed to thrive in the long term. An in-depth discussion of the combined use of Python and C++ can be found in Grosse-Kunstleve et al. (2002 ▶) and Abrahams & Grosse-Kunstleve (2003 ▶). This model is used throughout the PHENIX system. 1.2. Graphical user interface A new graphical user interface (GUI) for PHENIX was introduced in version 1.4. It uses the open-source wxPython toolkit, which provides a ‘native’ look on each operating system. Development has focused on providing interfaces around the existing command-line programs with minimal modification, using the same underlying configuration system (libtbx.phil) as used by most PHENIX programs as a template to automatically generate controls. Because these programs are implemented primarily as Python modules, complex data including models, reflections and other viewable data may be exchanged with the GUI without resorting to parsing log files. The current PHENIX release (version 1.5) includes GUIs for phenix.refine (Afonine et al., 2005 ▶), phenix.xtriage (Zwart et al., 2005 ▶), the AutoSol (Terwilliger et al., 2009 ▶), AutoBuild (Terwilliger, Grosse-Kunstleve, Afonine, Moriarty, Adams et al., 2008 ▶) and LigandFit (Terwilliger et al., 2006 ▶) wizards, the restraints editor REEL, all of the validation tools and several utilities for creating and manipulating maps and reflection files. More recent builds of PHENIX contain a new GUI for the AutoMR wizard and future releases will include a new interface for Phaser (McCoy et al., 2007 ▶). Intrinsically graphical data is visualized with embedded graphs (using the free matplotlib Python library) or a simple OpenGL viewer. This simplifies the most complex parameters, such as atom selections in phenix.refine, which can be visual­ized or picked interactively with the built-in viewer. The GUI also serves as a platform for additional automation and user customization. Similarly to the CCP4 interface (CCP4i; Potterton et al., 2003 ▶), PHENIX manages data and task history for separate user-defined projects. Default parameters and input files can be specified for each project; for instance, the generation of ligand restraints from the phenix.refine GUI gives the user the option of automatically loading these restraints in future runs. The popularity of Python as a scientific programming language has led to its use in many other structural-biology applications, especially molecular-graphics software. The PHENIX GUI includes extension modules for the modeling programs Coot (Emsley & Cowtan, 2004 ▶) and PyMOL (DeLano, 2002 ▶), both of which are controlled remotely from PHENIX using the XML-RPC protocol. This allows the interfaces to integrate seamlessly; any model or map in PHENIX can be automatically opened in Coot with a single click. In programs that iteratively rebuild or refine structures, such as AutoBuild and phenix.refine, the current model and maps will be continually updated in Coot and/or PyMOL as soon as they are available. In the validation utilities, clicking on any atom or residue flagged for poor statistics will recentre the graphics windows on that atom. Remote control of the PHENIX GUI is also simple using the same protocol and simple extensions to the Coot interface provide direct launching of phenix.refine with a model pre-loaded. 2. Analysis of experimental data PHENIX has a range of tools for the analysis, validation and manipulation of X-ray diffraction data. A comprehensive tool for analyzing X-ray diffraction data is phenix.xtriage (Zwart et al., 2005 ▶), which carries out tests ranging from space-group determination and detection of twinning to detection of anomalous signal. These tests provide the user and the various wizards with a set of statistics that characterize a data set. For analysis of twinning, phenix.xtriage consolidates a number of statistics to provide a balanced verdict of possible symmetry and twin-related issues with the data. Phenix.xtriage provides the user with feedback on the overall characteristics of the data. Routine usage of phenix.xtriage during or immediately after data collection has resulted in the timely discovery of twinning or other issues (Flynn et al., 2007 ▶; Kostelecky et al., 2009 ▶). Detection of these idiosyncrasies in the data typically reduces the overall effort in a successful structure determination. A likelihood-based estimation of the overall anisotropic scale factor is performed using the likelihood formalism described by Popov & Bourenkov (2003 ▶). Database-derived standard Wilson plots for proteins and nucleic acids are used to detect anomalies in the mean intensity. These anomalies may arise from ice rings or other issues (Morris et al., 2004 ▶). Data strength and low-resolution completeness are also analysed. The presence of anomalous signal is detected by analysis of the measurability, a quantity expressing the fraction of statistically significant Bijvoet differences in a data set (Zwart, 2005 ▶). The native Patterson function is used to detect the presence of pseudo-translational symmetry. A database-derived empirical distribution of maximum peak heights is used to assign significance to detected peaks in the Patterson function. A comprehensive automated twinning analysis is per­formed. Twin laws are derived from first principles to facilitate the identification of pseudo-merodehral cases. Amplitude and intensity ratios, 〈|E 2 − 1|〉 values, the L-statistic (Padilla & Yeates, 2003 ▶) and N(Z) plots are derived from data cut to the resolution limit suggested by the data-strength analysis. The removal of shells of data with relatively high noise content greatly improves the automated interpretation of these statistics. A Britton plot, H-test and a likelihood-derived approach are used to estimate twin fractions when twin laws are present. If a model has been supplied, an R versus R (Lebedev et al., 2006 ▶) analysis is carried out. This type of analysis is of particular use when dealing with pseudo-symmetry, space-group problems and twinning (Zwart et al., 2008 ▶). To test for inconsistent indexing between different data sets, a set of reindexing laws is derived from first principles given the unit cells and space groups of the sample and reference data sets. A correlation analysis suggests the most likely choice of reindexing of the data. Analysis of the metric symmetry of the unit cell provides a number of likely point groups. A likelihood-inspired method is used to suggest the most likely point group of the data. Subsequent analysis of systematic absences in a likelihood framework ranks subsequent space-group possibilities (details to be published). 3. Substructure determination, phasing and molecular replacement After ensuring that the diffraction data are sound and understood, the next critical necessity for solving a structure is the determination of phases using one of several strategies (Adams, Afonine et al., 2009 ▶). 3.1. Substructure determination The substructure-determination procedure implemented as phenix.hyss (Hybrid Substructure Search; Grosse-Kunstleve & Adams, 2003 ▶) combines the multi-trial dual-space recycling approaches pioneered by Shake-and-Bake (Miller et al., 1994 ▶) and later SHELXD (Sheldrick, 2008 ▶) with the use of the fast translation function (Navaza & Vernoslova, 1995 ▶; Grosse-Kunstleve & Brunger, 1999 ▶). The fast translation function is the basis for a systematic search in the Patterson function (performed in reciprocal space), in contrast to the stochastic alternative of SHELXD (performed in direct space). Phenix.hyss is the only substructure-determination program to fully integrate automatic comparison of the substructures found in multiple trials via a Euclidean Model Matching procedure (part of the cctbx open-source libraries). This allows phenix.hyss to detect if the same solution was found multiple times and to terminate automatically if this is the case. Extensive tests with a variety of SAD data sets (Grosse-Kunstleve & Adams, 2003 ▶) have led to a parameterization of the procedure that balances runtime considerations and the likelihood that repeated solutions present the correct substructure. In many cases the procedure finishes in seconds if the substructure is detectable from the input data. 3.2. Phasing Phaser, available in PHENIX as phenix.phaser, applies the principle of maximum likelihood to solving crystal structures by molecular replacement, by single-wavelength anomalous diffraction (SAD) or by a combination of both. The likelihood targets take proper account of the effects of different sources of error (and, in the case of SAD phasing, their correlations) and allow different sources of information to be combined. In solving a molecular-replacement problem with a number of different components, the information gained from a partial solution increases the signal in the search for subsequent components. Because the likelihood scores for different models can be directly compared, decisions among models can readily be made as part of automation strategies (discussed below). 3.3. Noncrystallographic symmetry (NCS) Noncrystallographic symmetry is an important feature of many macromolecular crystals that can be used to greatly improve electron-density maps. PHENIX has tools for the identification of NCS and for using NCS and multiple crystal forms of a macromolecule in phase improvement. Phenix.find_ncs and phenix.simple_ncs_from_pdb are tools for the identification of noncrystallographic symmetry in a structure using information from a heavy-atom substructure or an atomic model. Phenix.simple_ncs_from_pdb will identify NCS and generate transformations from the chains in a model in a PDB file. Phenix.find_ncs will identify NCS from either a heavy-atom substructure (Terwilliger, 2002a ▶) or the chains in a PDB file and will then compare this NCS with the density in a map to verify that the NCS is actually present. Phenix.multi_crystal_average is a method for combining information from several crystal forms of a structure. It is especially well suited to cases where each crystal form has its own NCS, adjusting phases for each crystal form so that all the NCS copies in all crystals are as similar as possible. NCS restraints should normally be applied in density modification and model building in all cases except where there is clear evidence that NCS is not present. In density modification within PHENIX the presence of NCS is identified from the heavy-atom sites or from an atomic model if available. The local correlation of density in NCS-related locations is then used automatically to set variable restraints on NCS symmetry in the map. In refinement, NCS symmetry is applied through coordinate restraints, targeting the positions of each NCS copy relative to those of the other NCS-related chains. The default NCS restraints in PHENIX are very tight, with targets of 0.05 Å r.m.s. At resolutions lower than about 2.5 Å these tight restraints on NCS should usually be applied. At higher resolutions it may be appropriate to use looser restraints or to remove them altogether. Additionally, if there are segments of the chains that clearly do not obey the NCS relationships they should be excluded from the NCS restraints. Normally this is performed automatically, but it can also be specified explicitly. 4. Model building, ligand fitting and nucleic acids Key steps in the analysis of a macromolecular crystal structure are building an initial core model, identification and fitting of ligands into the electron-density map and building an atomic model for loop regions that are less well defined than the majority of the structure. PHENIX has tools for rapid model building of secondary structure and main-chain tracing (phenix.find_helices_strands) and for the fitting of flexible ligands (phenix.ligandfit) as well as for fitting a set of ligands to a map (phenix.find_all_ligands) and for the identification of ligands in a map (phenix.ligand_identification). PHENIX additionally has a tool for the fitting of missing loops (phenix.fit_loops). Validation tools are provided so that the models produced can be validated at each step along the way. 4.1. Model building Phenix.find_helices_strands will rapidly build a secondary-structure-only model into a map or very rapidly trace the polypeptide backbone of a model into a map. To build secondary structure in a map, phenix.find_helices_strands identifies α-helical regions and β-strand segments, models idealized helices and strands into the corresponding density, allowing for bending of the helices and strands, and assembles these into a composite model. To very rapidly trace the main chain in a map, phenix.find_helices_strands finds points along ridgelines of high density where Cα atoms might be located, identifies pairs and then triplets of these Cα atoms that have density between the atoms and plausible geometry, constructs all possible connections of these Cα atoms into nonamers and then identifies all the longest possible chains that can be made by joining the nonamers. This process can build a Cα model at a rate of about 20 residues per second, yielding a backbone model that can readily be interpreted visually or automatically to evaluate the quality of the map that it is based on. Phenix.fit_loops will fit missing loops in an atomic model. It uses RESOLVE model building (Terwilliger, 2003a ▶,b ▶,c ▶) to extend the chain from either end where a loop is missing and to connect the chains into a loop with the expected number of residues. 4.2. Ligand fitting Phenix.ligandfit is a tool for fitting a flexible ligand into an electron-density map (Terwilliger et al., 2006 ▶). The key approaches used are breaking the ligand into its component rigid-body parts, finding where each of these can be placed into density, tracing the remainder of the ligand based on the positions of these core rigid-body parts and recombining the best parts of multiple fits while scoring based on the fit to the density. Phenix.find_all_ligands is a tool for finding all the instances of each of several ligands in an electron-density map. Phenix.find_all_ligands finds the largest contiguous region of unused density in a map and uses phenix.ligandfit to fit each supplied ligand into that density. It then chooses the ligand that has the highest real-space correlation to the density (Terwilliger, Adams et al., 2007 ▶). It then repeats this process until no ligands can be satisfactorily fitted into any remaining density in the map. Phenix.ligand_identification is a tool for identifying which ligands are compatible with unknown electron density in a map (Terwilliger, Adams et al., 2007 ▶). It can search using the 200 most common ligands from the PDB or from a user-supplied list of ligands. Phenix.ligand_identification uses phenix.ligandfit to fit each ligand to the map and identifies the best-fitting ligand using the real-space correlation and surface complementarity of the ligand and the atoms in the structure surrounding the ligand-binding site. 4.3. RNA and DNA In common with most macromolecular crystallographic tools, PHENIX was originally developed with protein structures primarily in mind. Now that nucleic acids, and especially RNA, are increasingly important in large biological structures, the system is being modified in places where subtle differences in procedure are needed rather than just the relevant libraries. Model building in phenix.autobuild now has a preliminary set of nucleic acid procedures that take advantage of the relatively well determined phosphate and base positions, as well as the preponderance of double helix, and that make use of the RNA backbone conformers recently defined by the RNA Ontology Consortium (Richardson et al., 2008 ▶). Nucleic acid structures benefit significantly from torsion-angle refinement, which has recently been added to the options in phenix.refine. A principal problem in RNA models is getting the ribose pucker correct, although it is known to consist almost entirely of either C3′-endo (which is commoner and that found in the A-form helix) or C2′-endo (Altona & Sundaralingam, 1972 ▶). MolProbity uses the perpendicular distance from the 3′ phosphate to the line of the C1′—N1/9 glycosidic bond as a reliable diagnostic of ribose pucker (Davis et al., 2007 ▶; Chen et al., 2010 ▶). This same test has now been built into phenix.refine to allow the use of pucker-specific target parameters for bond lengths, angles and torsions (Gelbin et al., 1996 ▶) rather than the uneasy compromise values (Parkinson et al., 1996 ▶) used in most pucker-agnostic refinement. Currently, if an incorrect pucker is diagnosed it must usually be fixed by user rebuilding, for instance in Coot (Emsley & Cowtan, 2004 ▶) or in RNABC (Wang et al., 2008 ▶). A rebuilding functionality will probably be incorporated into PHENIX soon, but in the meantime the refinement will now correctly maintain the geometry of a C2′-­endo pucker once it has been built and identified using conformation-specific residue names. 4.4. Maps, models and avoiding bias Phenix.refine (and the graphical tool phenix.create_maps) can produce various types of maps, including anomalous difference, maximum-likelihood weighted (p*mF obs − q*DF model)exp(iαmodel) and regular (p*F obs − q*F model)exp(iαmodel), where p and q are any user-defined numbers, filled and kick maps. The coefficients m and D of likelihood-weighted maps (Read, 1986 ▶) are computed using test-set reflections as described in Lunin & Skovoroda (1995 ▶) and Urzhumtsev et al. (1996 ▶). Data incompleteness, especially systematic incompleteness, can cause map distortions (Lunin, 1988 ▶; Tronrud, 1997 ▶). An approach to remedying this problem is to replace (‘fill’) missing observations with nonzero values. One can use DF model (similarly to REFMAC; Murshudov et al., 1997 ▶) to replace the missing F obs or use 〈F obs〉, where the F obs are averaged across a resolution bin around the missing F obs value. Based on a limited number of tests, both ‘filling’ schemes produce similar results, reiterating the importance of phases. However, it is important to keep in mind that by replacing missing F obs there is a risk of introducing bias and obviously the more incomplete the data is the larger the risk. At present it is advisable to use both maps simultaneously: filled and not filled. An average kick map (AK map; Gunčar et al., 2000 ▶; Turk, 2007 ▶; Pražnikar et al., 2009 ▶) is the result of the following procedure. A large ensemble of structures is created where the coordinates of each structure from the ensemble are all randomly shaken. A map is then computed for each structure. Finally, all maps are averaged to generate one AK map. An AK map is expected to have less bias and less noise and to enhance the existing signal and can potentially clarify some initially bad densities. A computationally intensive but powerful method of creating a very low-bias map is to carry out iterative model building and refinement while omitting one region of the map from all calculations of structure factors (Terwilliger, Grosse-Kunstleve, Afonine, Moriarty, Adams et al., 2008 ▶). The phenix.autobuild iterative-build OMIT map procedure carries this out automatically for either a single OMIT region or for overlapping OMIT regions to create a composite iterative-build OMIT map. 5. Model, and model-to-data, validation The result of crystallographic structure determination is the atomic model. There are three principal components in assessing model quality: the covalent model geometry, the model stereochemistry and the quality of fit between the model and experimental data in both real space and in reciprocal space. All three provide overall measures, and the first two plus the real-space aspect of the third also provide checks for local outliers, which give the best leverage for user intervention to actively improve model accuracy (Arendall et al., 2005 ▶). (Validation of the experimental data was described in §2 above.) PHENIX includes many individual tools for specific aspects of validation, plus several systems that combine those results into overall summaries. Validation is provided both for user evaluation of the progress and results of a structure solution and also to help inform the automated choices made by other parts of the system. Most aspects of the MolProbity model-validation tools (Davis et al., 2007 ▶; Chen et al., 2010 ▶) have been adapted or rewritten for integrated use within PHENIX and are pre­sented to the user by the new GUI (§1.2). H atoms are added by phenix.reduce, with optimization of entire local hydrogen-bond networks, consideration of the first layer of crystallo­graphic waters and optional correction of side-chain amide or histidine 180° ‘flips’ (Word, Lovell, Richardson et al., 1999 ▶). All-atom contacts (Word, Lovell, LaBean et al., 1999 ▶) are calculated by phenix.probe, which provides the atomic overlap information needed for the validation of serious all-atom steric clashes and can also be visualized in Coot. For the PHENIX GUI, the set of MolProbity-based tools provides both overall model statistics, such as clashscore and percentage of outliers, and detailed lists of the Ramachandran (Lovell et al., 2003 ▶), rotamer (Lovell et al., 2000 ▶), Cβ deviation (Lovell et al., 2003 ▶) and clash outliers. Command-line tools are available for these validation methods: phenix.rotalyze, phenix.ramalyze, phenix.cbetadev, phenix.clashscore, phenix.reduce and phenix.probe. Additionally, phenix.validate_model, which analyzes the deviations of bond lengths, bond angles, planarity etc. from ideal library values, complements the MolProbity torsional and atomic clash tools. Phenix.real_space_correlation asserts the local model-to-data correspondence by providing a quantitative measure of how the atomic model fits the electron-density map at the residue or atom level (depending on the resolution). Rapidly obtaining a snapshot of global figures of merit for a crystallo­graphic model and associated experimental data is a frequent task that is performed at all stages of structure solution. This task can be complicated for several reasons: the presence of novel ligands or nonstandard residues in the PDB-format (Berman et al., 2000 ▶) coordinate file, data collected from twinned crystals, various reflection datafile formats, different representation of atomic displacement parameters in the presence of TLS (Schomaker & Trueblood, 1968 ▶), experimental data type (X-­ray and/or neutron), files with multiple models and various formatting issues. Phenix.model_vs_data is designed to automatically handle all these complications with minimal user input (a PDB file and a reflection data file) and provide a concise summary output. Phenix.polygon (Urzhumtseva et al., 2009 ▶) is a graphical tool that is designed to indicate the similarity of validation parameters, such as free R value, for a particular structure compared with those deposited in the PDB. This comparison is performed for all other structures solved at similar resolution limits. The result is presented graphically. Phenix.validation combines all of the tools described above in one GUI, providing a single place for assessing the results of structure determination. 5.1. Model and structure-factor manipulation and analysis PHENIX has a range of tools for displaying, analyzing and manipulating structure-factor and model information. Phenix.mtz.dump and phenix.cif_as_mtz display and convert structure-factor data. Phenix.print_sequence, phenix.pdb_atom_selection and phenix.pdbtools display and manipulate coordinate files. Phenix.tls is a tool for the extraction and manipulation of TLS information. Using this tool, TLS matrices and selections can be extracted from REFMAC- or PHENIX-formatted PDB file headers and the total or residual atomic B factors can be computed and output. Future functionality will include the complete analysis of TLS matrices and their graphical visual­ization. Phenix.get_cc_mtz_mtz and phenix.get_cc_mtz_pdb are tools for analyzing the agreement between maps based on a pair of MTZ files or between maps calculated from an MTZ file and a PDB file. The key attributes of these tools are that they automatically search all allowed origin shifts that might relate the two maps and that they write out a modified version of one of the MTZ files or of the PDB file, shifted to match the other. 6. Structure refinement Phenix.refine is the state-of-the-art crystallographic structure-refinement engine of PHENIX. The foundational refinement machinery is a combination of highly efficient programming tools and new or rethought crystallographic algorithms. Phenix.refine possesses an extensive set of tools that cover the majority of refinement scenarios at any data resolution from low to ultrahigh. Various reflection-data formats (for example, CNS, MTZ and SHELX) are recognized automatically. The input experimental data are checked for outliers (Read, 1999 ▶; Zwart et al., 2005 ▶) and any reflections identified as such are excluded from the refinement calculations. Twinning can also be taken into account by providing a twin-law operator, which can be obtained using phenix.xtriage. Both X-ray and/or neutron diffraction data can be used and an option for joint XN refinement is available (simultaneous refinement against X-­ray and neutron data; Adams, Mustyakimov et al., 2009 ▶). Each refinement run begins with robust mask-based bulk-solvent correction and anisotropic scaling (Afonine et al., 2005 ▶). Tools such as efficient rigid-body refinement (multiple-zones algorithm; Afonine et al., 2009 ▶), simulated-annealing refinement (Brünger et al., 1987 ▶) in Cartesian or torsion-angle space (Grosse-Kunstleve et al., 2009 ▶), automatic NCS detection and its use as restraints in refinement are important at low resolution and in the initial stages of refinement. A broad range of atomic displacement parameterizations are available, including grouped isotropic, constrained anisotropic (TLS) and individual atomic isotropic or anisotropic, allowing efficient modelling of atomic displacement parameters at any resolution. Occupancy refinement (grouped, individual, group constrained for alternative conformations or any mixture) can be performed for any user-defined atoms. Atoms in alternative conformations are recognized automatically based on altLoc identifiers in the input PDB file and their occupancies are refined by default. Ordered solvent (water) model updating is integrated into the refinement process. The availability of ultrahigh-resolution data makes it possible to visualize the residual density arising from bonding effects; phenix.refine employs a novel interatomic scatterers model (Afonine et al., 2007 ▶) to adequately account for these features. A flexible parameterization of H atoms allows their use at any resolution from subatomic (where their parameters can be refined individually) to low resolution (where a riding model is used). Refinement can be performed using a variety of refinement target functions, including maximum likelihood, maximum likelihood with experimental phase information and amplitude least squares. The refinement of coordinates can be performed in real or reciprocal space (allowing dual-space refinement). Novel ligands can easily be included in refinement by providing a corresponding CIF file as input (the CIF file can be automatically created using phenix.ready_set). Manual fixing of amino-acid side-chain rotamers can be time-consuming, especially for large structures. Although the use of simulated-annealing refinement increases the convergence radius, it can still fail to fit incorrectly modelled side chains into the correct density. Phenix.refine has an option for automatic selection of the best rotamer based on a rotamer library (Lovell et al., 2000 ▶) and optimal fit into the density (details to be published elsewhere). Furthermore, coupling real-space refinement with the built-in rotamer library and available MolProbity tools allows the automated identification and robust correction of common systematic errors involving backward-fit conformations for Leu, Thr, Val, Ile and Arg side chains, as developed and tested in the Autofix method (Headd et al., 2009 ▶). Phenix.refine allows multi-step complex refinement protocols in which most of the available refinement strategies can be combined with each other and applied to any selected part of the model. For example, a run of phenix.refine may perform rigid-body refinement, simulated annealing, individual and grouped B factors combined with TLS refinement, constrained occupancy refinement and automatic water picking. The output of phenix.refine includes various maps (maximum-likelihood weighted, kicked, incompleteness corrected, anomalous difference and those with any user-defined coefficients), complete model and data statistics and PDB file with a formatted REMARK 3 header ready for PDB deposition. The phenix.refine GUI is integrated with Coot and PyMOL, allowing seamless visual analysis of the refined model and associated maps. Phenix.refine is tightly integrated with other PHENIX components, making structure solution, building and refinement a one-step process (for example, in the AutoMR and AutoBuild wizards). It is routinely tested by automatic re-refinement of all models in the PDB for which the experimental data are available. 6.1. Ligand-coordinate and restraint-geometry generation The electronic Ligand Builder and Optimization Builder (eLBOW; Moriarty et al., 2009 ▶) is a suite of tools designed for the reliable generation of Cartesian coordinates and geometry restraints for both novel and known ligands. In line with the rest of the PHENIX package, the eLBOW modules are written in Python, with the numerically intensive portions of the code written in C++. eLBOW is a flexible platform for converting a majority of common chemical inputs to optimized three-dimensional coordinates and geometry restraints for refinement. Ligand geometries can be minimized using the semi-empirical AM1 quantum-chemical method (Stewart, 2004 ▶), a numerically efficient and chemically accurate technique for the class of molecules commonly complexed with or bound to proteins. In addition, a graphical user interface for editing geometry restraints and simple geometry manipulation of ligands has been developed. The Restraints Editor, Especially Ligands (REEL) removes the tedium of manually editing a restraints file by providing a number of commonly performed actions via pull-down menus and other interactive features. The effect of changes in the restraints can be immediately reflected in the molecule view to provide user feedback. A tool that uses many of the features of eLBOW to quickly and easier prepare a protein model for refinement is known as ReadySet! The flexibility of the Python interface is exemplified by the use of Reduce, eLBOW and several smaller portions of the cctbx toolkit to add H and/or D atoms to the model, ligands and water and to generate metal-coordination files and geometry restraints for unknown ligands. The files required for covalently bound ligands are also generated. 7. Integrated structure determination 7.1. Why automation? Automation has dramatically changed macromolecular crystallography over the past decade, both by greatly speeding up the process of structure solution, model building and refinement and by bringing the tools for structure determination to a much wider group of scientists. As automation becomes increasingly comprehensive, it will allow users to test many more possibilities for structure determination, will allow improved estimation of uncertainties in the final structures and will allow the determination of ever more complex and difficult structures. The PHENIX environment has been developed with automation as a key and defining feature. Each tool within PHENIX can seamlessly and nearly effortlessly be incorporated as part of any other tool or process in PHENIX. This means that very complex tasks can be built up from well tested and characterized tools and that tools and higher-level methods can be re-used in many different contexts. With a full automatic regression testing system as an integral part of the PHENIX environment, all these tasks and high-level methods are tested daily to ensure the integrity of the entire PHENIX system. 7.2. Automated structure solution PHENIX has fully integrated structure-solution capability for both experimental phasing (MAD, SAD, MIR and com­binations of these), carried out by phenix.autosol, and for molecular replacement, performed by phenix.automr. Each of these automated procedures feeds directly into the iterative model building, density modification and refinement of phenix.autobuild. Phenix.autosol is designed to allow complete automation of experimental phasing while allowing a high degree of flexibility for advanced users. Beginning with structure-factor amplitudes and the sequence of the macromolecule, phenix.autosol uses phenix.solve (Terwilliger & Berendzen, 1999 ▶) to scale all data sets, phenix.xtriage (Zwart et al., 2005 ▶) to analyze the data for twinning and to correct any anisotropy in the data and phenix.hyss (Grosse-Kunstleve & Adams, 2003 ▶) to find potential heavy-atom or anomalously scattering atoms. Phenix.autosol carries out experimental phasing with phenix.phaser (McCoy et al., 2004 ▶, 2007 ▶) or phenix.solve (Terwilliger & Berendzen, 1999 ▶), density modification with phenix.resolve (Terwilliger, 1999 ▶) and preliminary model building using the methods in phenix.autobuild (Terwilliger, Grosse-Kunstleve, Afonine, Moriarty, Zwart et al., 2008 ▶). A key step in automated structure solution is the identification of which of several possible space-group and heavy-atom or anomalously scattering-atom substructures is correct. Phenix.autosol uses a Bayesian scoring algorithm based on analysis of the experimental electron-density maps to identify which substructures lead to the best maps (Terwilliger et al., 2009 ▶). The main features of the maps that are used in this evaluation are the skewness of the electron density (non-Gaussian histogram of density with more density in the positive tail than the negative tail) and the correlation of local r.m.s. density (large contiguous regions of high variation where the molecule is located and separate large contiguous regions of low variation where the solvent is located). Phenix.autosol is highly flexible, allowing any combination of experimental data, such as MAD + SIRAS or several SAD data sets. Although it is fully automated, the user can control nearly all aspects of the operation of the procedure, including the scoring criteria and decisions about how certain phenix.autosol should be that the correct solution is contained in the current lists of solutions. Phenix.autosol can carry out phasing using a combination of experimental SAD data and molecular-replacement information. If a molecular-replacement model is available, phenix.autosol will use phenix.phaser (McCoy et al., 2004 ▶, 2007 ▶) to complete the anomalous substructure iteratively by con­structing log-likelihood gradient maps for the anomalous scatterers based on the model of the non-anomalous structure and any anomalous scatterers that have already been found. The anomalous substructure is then used along with the model to calculate phases with phenix.phaser. Phenix.automr carries out automated likelihood-based molecular replacement using phenix.phaser (Read, 2001 ▶; McCoy et al., 2005 ▶, 2007 ▶; McCoy, 2007 ▶). The procedure is highly automated, allowing several copies of each of several components to be placed in a single run, which can also test different possible choices of space group. If there are alternative choices of model for a component, the molecular-replacement calculation can try each of them in turn or combine them as a statistically weighted ensemble. Although the evaluation of the likelihood targets is slow (Read, 2001 ▶), the use of fast approximations for the rotation search (Storoni et al., 2004 ▶) and the translation search (McCoy et al., 2005 ▶) gives run times that are competitive with traditional Patterson-based methods. Likelihood has been demonstrated to be more sensitive to the correct solution, particularly in difficult cases (Read, 2001 ▶). When there are several copies or several components to place, the ability of the likelihood functions to take advantage of preliminary partial solutions can provide a crucial increase in the signal. 7.3. Iterative model building, density modification and refinement Phenix.autobuild is a highly integrated and automated procedure for model building and model improvement through iterative model building, density modification and refinement. Phenix.autobuild uses phenix.resolve (Terwilliger, 2003a ▶,b ▶) to carry out model building, model extension, model assembly, loop fitting and building outside existing models. It further uses phenix.resolve to improve electron-density maps with statistical density modification, including information from the newly built models as well as that obtained from experiment (e.g. phenix.autosol), from NCS (Terwilliger, 2002b ▶) and from other expected features of electron-density maps such as a flat solvent (Wang, 1985 ▶), the presence of secondary-structural features (Terwilliger, 2001 ▶) and the presence of local patterns of density characteristic of macromolecules (Terwilliger, 2003c ▶). To reduce model bias in the procedure, prime-and-switch phasing can also be used (Terwilliger, 2004 ▶). Phenix.autobuild uses phenix.refine (Afonine et al., 2005 ▶) throughout this process to improve the quality of the models that are built. Phenix.autobuild provides two complementary approaches to model building. For cases in which no model or only a preliminary model has been built, phenix.autobuild will con­struct a new model considering the main chain of any supplied models as potential coordinates. In cases where a nearly final model is available, phenix.autobuild can apply a rebuild-in-place approach in which the polypeptide chain is rebuilt a few residues at a time without changing the register or the overall features of the model. The rebuild-in-place approach in phenix.autobuild provides a powerful method for the assessment of uncertainties in an atomic model by repetitive rebuilding of the model using different random seeds for each iteration (Terwilliger, Grosse-Kunstleve et al., 2007 ▶). The variability in the coordinates of each atom in the ensemble that is created is a lower bound on the uncertainty of the position of that atom. 8. Conclusions Advances in computational methods and algorithms have made it possible to automate the solution of many structures with PHENIX. However, many challenges still exist. In particular, the development of automated methods that can be applied at low resolution (worse than 3.0 Å) remains a priority. In this resolution range there are typically too few experimental data to uniquely define the macromolecular structure for automated ab initio model building. Thus, methods are required that rely on prior knowledge from existing macromolecular structures to permit productive automated data interpretation. These methods will need to be developed and applied for all stages of structure solution and tightly integrated to maximize the information extracted from the experimental data.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found
            Is Open Access

            <i>Coot</i> : model-building tools for molecular graphics

            Acta Crystallographica Section D Biological Crystallography, 60(12), 2126-2132
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              MolProbity: all-atom structure validation for macromolecular crystallography

              1. Summary of MolProbity flow and user interactions The usual interaction with MolProbity (Davis et al., 2007 ▶) is through the internet at http://molprobity.biochem.duke.edu or as a main menu item on our general laboratory website at http://kinemage.biochem.duke.edu. [For bulk users, it is also possible to set up your own local MolProbity server or to use the individual programs in command-line mode.] Tutorial exercises for the whole process of diagnosing and fixing errors can be found on the kinemage site under Teaching/MolProbity. A typical MolProbity session starts with the user uploading a coordinate file of their own or fetching one from the PDB or NDB databases (Berman et al., 1992 ▶, 2000 ▶) in new or old PDB format or in mmCIF format. After checking the thumbnail image and listed characteristics of the input file and editing or reloading if needed, H atoms are added and optimized, with automated correction of Asn/Gln/His 180° flips if needed (Word, Lovell, Richardson et al., 1999 ▶). The user then chooses which validation analyses to run and what reports and output files to generate. The MolProbity interface adjusts the defaults and options presented and even the page flow depending on user choices and on the properties of the file being worked on. These adjustments make MolProbity simple for novice users, while at the same time allowing advanced users to have great control over their runs. The core ‘glue’ that generates the HMTL code controlling the main user interface and programmatic interactions of MolProbity is implemented in the PHP programming language. Underlying the PHP core, the majority of the analysis tasks in MolProbity are performed by individual programs written in a range of languages, including C, C++, Java and Perl. It uses REDUCE and PROBE for all-atom contact analysis, RAMALYZE, ROTALYZE, DANGLE, SILK and SUITENAME for other criteria and KiNG for three-dimensional visualization of the structure and its validation markers directly in the browser. Fig. 1 ▶ shows a key to MolProbity’s graphical markers for validation outliers. Further details are provided below on the specific analyses that MolProbity can perform. The validation results are reported in the form of summaries, charts, two-dimensional and three-dimensional graphics and output files for download. The crucial final step in the MolProbity process is for the crystallographer to download the result files and work off-line to correct as many of the diagnosed problems as feasible. Rebuilding with consideration of the validation outliers, the electron density and the surrounding model is usually per­formed either in Coot (Emsley & Cowtan, 2004 ▶) or in KiNG (Chen et al., 2009 ▶). At resolutions of about 2.5 Å or better it is possible to correct the great majority of outliers (Arendall et al., 2005 ▶), with an order-of-magnitude improvement in the various MolProbity scores and some improvement in geometry, map quality, R factor and R free. An example is shown in Fig. 2 ▶ with before-and-after multi-criterion kinemages. 2. Validation analyses 2.1. Addition of H atoms The presence of H atoms (both nonpolar and polar) is a critical pre­requisite for all-atom contact analysis. Although refinement using H atoms is becoming more common, most crystal structures are still deposited without H atoms. Once a PDB structure file has been uploaded, MolProbity detects whether the file contains a suitable number of H atoms; if not, then the ‘Add H atoms’ option is presented to users first. MolProbity uses the software REDUCE (Word, Lovell, Richardson et al., 1999 ▶) to add and optimize hydrogen positions in both protein and nucleic acid structures, including ligands, but does not add explicit H atoms to waters. OH, SH and NH3 groups (but not methyl groups) are rotationally optimized and His protonation is chosen within each local hydrogen-bond network, including interactions with the first shell of explicit waters. A common problem is that the side-chain ends of Asn, Gln and His are easily fitted 180° backwards, since the electron density alone cannot usually distinguish the correct choice of orientation. REDUCE can automatically diagnose and correct these types of systematic errors by considering all-atom steric overlaps as well as hydrogen bonding within each local network. Automatic correction of Asn/Gln/His flips is the default option in MolProbity during addition of H atoms. MolProbity presents each potential flip correction to the user in kinemage view so they have the option of inspecting the before-and-after effects of each flip and approving (or rejecting) each correction. Fig. 3 ▶ shows an example of a simple Gln flip that is unquestionably correct but that could not have been decided on the basis of hydrogen bonding alone. Other examples can be much more complex, with rotatable OH positions, large hydrogen-bond net­works and multiple com­peting inter­actions evaluated exhaustively. Users can also choose to add H atoms without Asn/Gln/His flips, which is useful in evaluating the atomic coordinates as they were deposited, but which rejects the easiest and most robustly correct improvement that can be made in a crystallo­graphic model (Word, Lovell, Richardson et al., 1999 ▶; Higman et al., 2004 ▶). If flips are performed, the user needs to download and use the corrected PDB file (either with or without the H atoms) in order to benefit. 2.2. All-atom contact analysis Once H atoms have been added to (or detected in) a structure, then the complete ‘Analyze all-atom contacts and geometry’ option is enabled. A main feature of this option is the all-atom contact analysis, which is performed by the program PROBE (Word, Lovell, LaBean et al., 1999 ▶). PROBE operates by, in effect, rolling a 0.5 Å diameter ball around the van der Waals surfaces of atoms to measure the amount of overlap between pairs of nonbonded atoms. When non-donor–acceptor atoms overlap by more than 0.4 Å, PROBE denotes the contact as a serious clash, which is included in the reported clashscore and is shown in kinemage format as a cluster of hot-pink spikes in the overlap region (Fig. 1 ▶). Such large overlaps cannot occur in the actual molecule, but mean that at least one of the two atoms is modeled incorrectly. MolProbity allows users to select any combination of clashes, hydrogen bonds and van der Waals contacts to calculate and display on the structure. By default, all three are enabled for structures that are not excessively large; for large structures, van der Waals contacts are deselected. The ‘clashscore’ is the number of serious clashes per 1000 atoms. It is reported in the MolProbity summary (top of Fig. 4 ▶), with a red/yellow/green color coding for absolute quality. The structure’s percentile rank for clashscore value within the relevant resolution range is also given. In the detailed sortable ‘multi-chart’ (an extract is shown below the summary in Fig. 4 ▶), the worst clash ≥0.4 Å is listed for each residue and highlighted in pink. 2.3. Torsion-angle combinations: updated Ramachandran and rotamer analyses Also included in the ‘Analyze all-atom contacts and geometry’ option is an evaluation of where residues fall in the multi-dimensional distributions of Ramachandran backbone ϕ, ψ angles and side-chain rotamer χ angles. The reference distributions are currently from 100 000 residues in 500 files, quality-filtered at both the file and the residue level. The Ramachandran plots are separated for Gly, Pro and pre-Pro residue types; the general plot has only one in 2000 residues outside the ‘allowed’ contour, which is the same probability as a 3.5σ outlier in a normal distribution. The three specific plots can be robustly contoured only down to excluding one in 500 residues (about 3σ) in the current reference data, but will soon be updated. By ‘robust’ we mean that the contour does not shift with further improvement in resolution or B or with different subselections of the data. When values plateau in this way we can define clear absolute goals for the measure, such as 98% for Ramachandran favored, 2.9 Å for C3′-­endo and <2.9 Å for C2′-endo. MolProbity checks this distance against the modeled sugar pucker, as well as outliers in individual ∊ or δ values. All such outliers are listed in the multi-chart and ribose-pucker outliers are flagged in the kinemage (Fig. 1 ▶). An example is shown in Fig. 6 ▶, where what should have been a C2′-endo pucker (by the short perpendicular) was fitted as an intermediate unfavorable pucker close to the more common default C3′-endo pucker, also producing geometry and ∊ outliers. High-dimensional analysis of the combinations of backbone torsion angles within an RNA ‘suite’ (the unit from sugar to sugar) has shown that there are distinct ‘rotameric’ backbone conformers. The RNA Ontology Consortium has defined a two-character nomenclature and an initial set of 54 favorable RNA backbone conformers (Richardson et al., 2008 ▶). We created the SUITENAME program to identify either the named conformer or an outlier for each suite in an RNA structure. These conformers and their ‘suiteness’ quality score are listed in the MolProbity multi-chart. 2.6. The overall MolProbity score In response to user demand, the ‘MolProbity score’ provides a single number that represents the central Mol­Probity protein quality statistics. It is a log-weighted combination of the clashscore, percentage Ramachandran not favored and percentage bad side-chain rotamers, giving one number that reflects the crystallographic resolution at which those values would be expected. Therefore, a structure with a numerically lower MolProbity score than its actual crystallo­graphic resolution is, quality-wise, better than the average structure at that resolution. There is some distortion in the fit at very high or very low resolutions; for these ranges it is preferable to judge by the resolution-specific percentile score, which is also reported in the summary. Percentile scores are currently given for clashscore and for MolProbity score relative to the cohort of PDB structures within 0.25 Å of the file’s resolution. 3. Correction of outliers 3.1. Manual rebuilding Except for Asn/Gln/His flip corrections, MolProbity does not yet directly include the ability to correct the errors it finds in structures; it relies on users having access to standalone local software for rebuilding and refinement. The stand­alone version of KiNG has some rebuilding tools for modeling side chains and making small local ‘backrub’ adjustments to structures, with the help of electron-density display, interactive contact dots and rotamer evaluation (Davis et al., 2006 ▶; Chen et al., 2009 ▶). Fig. 7 ▶ illustrates such a correction process in KiNG, rebuilding a backward-fitted leucine with a clash and a bad rotamer (one of the cases of a systematic error), resulting in an ideal geometry side chain with an excellent rotamer and well packed all-atom contacts. The top view shows that the original and rebuilt side chains fit the terminal methyls into the same rather ambiguous density, but move the Cγ substantially. More recent versions of this DNA polymerase structure (e.g. PDB code 2hhv at 1.55 Å resolution; Warren et al., 2006 ▶) all use the new conformation. Manual rebuilding is facilitated by the fact that all-atom clashes are inherently directional, as are bond-angle distortions, while a good library of rotamer choices helps the user test all the alternatives. For more extensive refitting, a fully featured crystallo­graphic rebuilding program such as Coot (Emsley & Cowtan, 2004 ▶) is needed. MolProbity generates ‘to-do’ scripts that can be read into Coot, bringing up a button list, where each entry will zoom to a problem area. In combination with the ability of Coot to use REDUCE and PROBE interactively to generate all-atom contact dots, these features make it easier to address the problems diagnosed by MolProbity. Any rebuilding that moves atoms must of course then undergo further crystallo­graphic refinement. Our own laboratory tested the combined cycle of MolProbity, rebuilding and refinement on about 30 protein structures as part of the SouthEast Collaboratory for Structural Genomics (Arendall et al., 2005 ▶), finding that its early application led to a smoother structure-solution process and demonstrably better final structures. In addition to backward-fitted side chains, commonly corrected problems included peptide flips, switched backbone and side chain near chain ends, ‘waters’ that were really ions, noise peaks or unfit alternate conformations and occasionally a shift in sequence register. Many other crystallographic groups have since adopted these methods. 3.2. Automated corrections For correcting RNA-suite outliers, we have collaboratively developed the independent program RNABC (Wang et al., 2008 ▶), which performs an automated search for more suitable backbone conformations of an RNA suite diagnosed with a bad ribose pucker or serious clashes. It leaves the more accurately determined bases and P atoms fixed in place and performs a pruned but systematic search through the other parameters, outputting all acceptable alternatives found within user-set tolerance limits. Recently, we have developed and tested the AUTOFIX program for automated correction of diagnosed backward-fitted Thr, Val, Leu and Arg side chains (Headd et al., 2009 ▶). In contrast to Asn/Gln/His flips, which simply exchange atoms and do not change the agreement with the data, these more complex side chains require real-space refinement in order to determine the proper correction and crystallographic re-refinement after the approximate 180° flips have been made. The original version used Coot to perform rotamer selection and real-space refinement for the proposed corrections, with MolProbity diagnosis before and after. Results were checked by re-refinement. Run on a sample of 945 PDB files, AUTOFIX accepted corrections for over 40% of diagnosed bad Thr, Val and Leu side chains and 15% of bad Arg side chains, or 3679 corrected side chains. A second version is now in the testing stage that substitutes PHENIX real-space refinement, has a faster Python wrapper and also works on Ile. It will soon be incorporated into MolProbity. The most important of our requirements for AUTOFIX is that it does no harm; we are willing to miss some of the possible corrections in order to ensure that those we accept are essentially always true improvements. AUTOFIX should provide MolProbity users with an easy and reliable way of making an initial set of meaningful improvements to their protein structures. Thr and Arg, in particular, make hydrogen bonds that are often important at active sites or binding interfaces and since they are asymmetrical these interactions change drastically if the side chain is fitted backwards. Such improvements were often seen in the test set. 4. Other MolProbity utility functions 4.1. Interface analysis PROBE can also be used to calculate the all-atom contacts at interfaces, e.g. between two chains of a structure or between a protein and a ligand. Access to this feature is provided in MolProbity by the ‘Visualize interface contacts’ analysis option after H atoms have been added. The user is required to choose the chains and/or the molecular types for which to calculate the contacts (e.g. protein versus protein or protein versus heteroatoms or RNA). This functionality creates both a kinemage with the resulting all-atom contacts displayed on the model and a text list of the atom pairs in contact. 4.2. Protein loop fitting MolProbity includes the Java software JIFFILOOP for providing potential protein-fragment conformations that can fit within a gap in a protein structure. We have defined a seven-parameter system that describes the spatial relationship between any two peptides. Briefly, this system consists of the sequence separation, the distance between the two inner Cα atoms, two pseudo-angles and three pseudo-dihedrals. We used this system to create a library of B-factor-filtered fragments from one to 15 peptides long from our Top5200 database of structures, a set of structures chosen from each 70% nonredundant group defined by the PDB, requiring an average of resolution and MolProbity score of ≤2.0. MolProbity runs JIFFILOOP to search this library for candidate fragments to fill gaps within a structure. Alternatively, users can enter beginning and ending residue numbers and MolProbity will search for fragments which can fit between those two residues. Because this process can be fairly time-intensive, JIFFILOOP is not listed under ‘Suggested Tools’ and is currently only accessible under ‘All Tools’ or at the Site map. Also, owing to the size of this package it must be added separately to the installation for a standalone MolProbity server. 4.3. Kinemage construction and viewing MolProbity provides scripts (under the ‘Make simple kinemages’ option) for constructing a number of commonly used kinemage three-dimensional interactive visualization options such as ribbons and various types of stick figures. This functionality is useful for quick browsing of a structure or for initial creation of an illustration or presentation. The file-input page can also accept upload of pre-existing kinemage files for direct on-line viewing within the built-in kinemage viewer KiNG. 4.4. Other file types and functions MolProbity uses a built-in PDB ‘het_dictionary’ for the information needed to add H atoms to small-molecule ligands. The user can construct and read in a custom dictionary if their file contains novel ligands. There is also provision for either uploading or fetching an electron-density map from the Electron Density Server (Kleywegt et al., 2004 ▶) in any of several formats to view on-line in KiNG together with the model and validation results. To investigate functional sites that span across asymmetric units, one can fetch a biological unit file from the PDB. In the file-editing feature, the user can specify whether multiple ‘models’ are alternatives (as in an NMR ensemble) or have been pressed into service for the extra chains in the biological unit. Some X-ray structures are now treated as ensembles. For such cases, MolProbity internally splits the models and analyzes them separately, but constructs an outlier summary strip-chart and a multi-model multi-criterion validation kinemage with both the models and their features under on/off button control. File editing also allows the deletion of chains either before or after hydrogen placement, specifying the resolution of the structure if not given in the file header or removing unwanted H atoms. These tools make it easier and faster to analyze particular parts of a structure using MolProbity and they help to maintain compatibility with other older software. These options are always available as separate utility functions, independent of validation or hydrogen content. 4.5. PDB-format interconversion The release of the remediated PDB version 3.0 format in August 2007 included a number of significant changes, particularly to H-atom names and to nucleic acid residue and atom names. In order to maintain compatibility with the PDB, we converted the entire MolProbity core to use the new format by default. This included updating REDUCE, PROBE, KiNG and PREKIN. However, we realised that users might need to analyze files that were still in the older PDB version 2.3 format. In order to maintain backwards compatibility, we created a Remediator script (available as a standalone Perl or Python script) that can interconvert between the old and the new PDB formats. Whenever a file is input, MolProbity will scan for the presence of old-format atom names and if it detects any then it will run the Remediator script to automatically convert the input file to the new format. After analysis there is then an option available to run the Remediator script and downgrade the output file back into the old version 2.3 format if needed. This allows use of the MolProbity analysis tools even together with older software that has not been updated to use the new format. 5. Discussion 5.1. Global versus local, absolute versus comparative There are three quite different purposes served by structure validation: a gatekeeper function on quality for reviewers or organizations, an aid to crystallographers for obtaining the most model accuracy from their data and a guide to end users for choosing appropriate structures and confidence levels for the conclusions they want to draw. Validation criteria also come in distinct flavors. Those based on the diffraction data are inherently global with respect to the model; for instance, resolution (which is still the most valuable single-factor estimate of model accuracy) and R free (Brünger, 1992 ▶). On the data side, there are also gatekeeper checks for unusual problems such as twinning or gross data incompleteness. R.m.s.d. or r.m.s.Z. of deviations from geo­metrical target values are global, but they only evaluate procedural aspects of refinement and have little to do with model accuracy. Most other validation criteria are inherently local (at the residue or even atom level), including B factor, real-space measures such as RSR-Z (Kleywegt et al., 2004 ▶) and model-only measures such as the various MolProbity criteria described here. Any local measure becomes global when expressed in some normalized form across the entire structure, such as an average, a distribution match or a percentage occurrence of outliers. Strictly local measures are usually not resolution-dependent, but their globally defined versions often are. For some purposes, the desirable form of measure is a comparison (usually a percentile rank) with the cohort of PDB structures at similar resolution. MolProbity currently provides resolution-group percentiles for clashscore and for MolProbity score and will probably expand that to other criteria. Reviewer/gatekeepers are primarily interested in global relative measures such as resolution-dependent percentiles and to some extent in absolute local flags for judging the support behind specific claims. Crystallographers need global relative measures to judge how well they have made use of their data, but it is the local measures, especially specific outliers, that are crucial to helping them to achieve a more accurate structure and to avoid making any dubious claims in poor local regions (such as an invisible inhibitor). End users need absolute global measures to choose between structures and absolute local measures to judge the reliability of the particular features they find of interest. Because of the importance of improving and evaluating the accuracy of individual details of biological importance, both in each structure and in the database as a whole, we have chosen in MolProbity to emphasize calculation and user-friendly display of local indicators. We have also tried to minimize ‘false alarms’, so that a flagged outlier is almost always worth a close look. 5.2. Impact on database quality Since MolProbity was first made available in late 2002, serious user work sessions (performing some operation on an input coordinate file) have multiplied by a large factor each year, with a cumulative total that is now approaching 100 000 by thousands of distinct users. In addition, many companies and structural genomics centers run their own MolProbity servers internally and some aspects have been incorporated into other software or meta-servers. 80% of MolProbity input files are uploaded, presumably by working structural biologists, and the rest are fetched from databases, presumably by end users. Those end users also include students, since MolProbity is increasingly being used for instructional exercises in bio­chemistry classes from high school to graduate level. MolProbity’s unique feature is clash analysis from all-atom contacts, which provides sensitive new evaluation independent of refinement targets. Not surprisingly, the average clashscore remained constant (either globally or by resolution) up to 2002, since there was then no feasible way of targeting or even measuring all-atom clashes. The percentage of incorrect Asn/Gln/His flips also remained level or rose slightly prior to 2003, despite the availability of a hydrogen-bond-based system in WHAT IF (Hooft et al., 1996 ▶), and even while refinement methods, automation and Ramachandran and rotamer quality all improved. To evaluate the contribution MolProbity has made to crystallographic model quality in general, we have therefore plotted clashscore and Asn/Gln/His flips as a function of time in Fig. 8 ▶, with separate linear fits before and after the end of 2002. Gratifyingly, in both cases there is a clear trend of improvement since 2003. Median values also improve very steadily over that period. Anecdotal evidence indicates that this trend is mainly a consequence of thorough adoption of MolProbity-based methods by a small but growing fraction of crystallographers and there is therefore still much scope for further improvement in the future. 6. MolProbity availability MolProbity is freely available for download from http://molprobity.biochem.duke.edu for use as a local server. This option requires either Linux or MacOSX, together with PHP and Apache. Instructions for installing MolProbity locally are included with the download. Having a local install allows users to access the MolProbity analysis tools without internet access, as well as allowing companies with privacy or confidentiality concerns to use MolProbity. However, one of the most significant advantages to having a local installation of Mol­Probity is access to command-line tools. These tools provide access to the major analysis tools in MolProbity without having to use the web interface. Also, several scripts are included which allow users to run MolProbity analysis on a set of files rather than just one at a time. Some of the more useful command-line scripts include the following: scripts for adding H atoms, with or without flips, a script for obtaining overall scores for a set of files and a script for calculating a residue-by-residue analysis of a structure. For users of the PHENIX crystallography system (Adams et al., 2002 ▶, 2009 ▶), a number of the main MolProbity quality-analysis tools have been incorporated directly into PHENIX and are accessible through command-line tools or in the PHENIX GUI, including REDUCE, PROBE, RAMALYZE, ROTALYZE, CBETADEV and CLASHSCORE. Currently, only tabular results are provided; we are exploring the possibility of incorporating KiNG and validation visualizations into PHENIX. All of the individual programs called by MolProbity are also available, multi-platform and open source, from the software section at http://kinemage.biochem.duke.edu.
                Bookmark

                Author and article information

                Journal
                Int J Mol Sci
                Int J Mol Sci
                ijms
                International Journal of Molecular Sciences
                MDPI
                1422-0067
                09 December 2020
                December 2020
                : 21
                : 24
                : 9377
                Affiliations
                [1 ]Biomedical Research Institute, Korea Institute of Science and Technology (KIST), Hwarangro 14-gil 5, Seongbuk-gu, Seoul 02792, Korea; scshin84@ 123456kist.re.kr
                [2 ]Center for Neuro-Medicine, Brain Science Institute, KIST, Seoul 02792, Korea; ph_karem2000@ 123456mans.edu.eg (A.K.E.-D.); juhyeonlee85@ 123456gmail.com (J.H.L.); shseo@ 123456kist.re.kr (S.H.S.); eunkbang@ 123456kist.re.kr (E.K.B.)
                [3 ]Department of Medicinal Chemistry, Faculty of Pharmacy, Mansoura University, Mansoura 35516, Egypt
                [4 ]College of Pharmacy, Keimyung University, Daegu 42601, Korea; jihyun96031@ 123456naver.com (J.H.K.); seoyho@ 123456kmu.ac.kr (Y.H.S.)
                [5 ]New Drug Development Center, Daegu-Gyeongbuk Medical Innovation Foundation, Daegu 41061, Korea; leeyuri45@ 123456dgmif.re.kr (Y.L.); yujihoon@ 123456dgmif.re.kr (J.H.Y.)
                [6 ]Division of Bio-Medical Science & Technology, KIST School, Korea University of Science and Technology (UST), Seoul 02792, Korea
                Author notes
                [* ]Correspondence: eunice@ 123456kist.re.kr (E.E.K.); gkeum@ 123456kist.re.kr (G.K.)
                [†]

                These authors equally contributed to this work.

                Author information
                https://orcid.org/0000-0003-0156-680X
                https://orcid.org/0000-0002-7707-8055
                https://orcid.org/0000-0001-8767-8022
                Article
                ijms-21-09377
                10.3390/ijms21249377
                7763603
                33317068
                2aa7c76b-74f5-4604-bd97-789fbe672ab5
                © 2020 by the authors.

                Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

                History
                : 22 September 2020
                : 03 December 2020
                Categories
                Article

                Molecular biology
                hsp90 inhibitors,biib021 analogs,isoxazole,hydrophobic binding pocket,x-ray crystallography

                Comments

                Comment on this article