117
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      SHELXT – Integrated space-group and crystal-structure determination

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          SHELXT automates routine small-molecule structure determination starting from single-crystal reflection data, the Laue group and a reasonable guess as to which elements might be present.

          Abstract

          The new computer program SHELXT employs a novel dual-space algorithm to solve the phase problem for single-crystal reflection data expanded to the space group P1. Missing data are taken into account and the resolution extended if necessary. All space groups in the specified Laue group are tested to find which are consistent with the P1 phases. After applying the resulting origin shifts and space-group symmetry, the solutions are subject to further dual-space recycling followed by a peak search and summation of the electron density around each peak. Elements are assigned to give the best fit to the integrated peak densities and if necessary additional elements are considered. An isotropic refinement is followed for non-centrosymmetric space groups by the calculation of a Flack parameter and, if appropriate, inversion of the structure. The structure is assembled to maximize its connectivity and centred optimally in the unit cell. SHELXT has already solved many thousand structures with a high success rate, and is optimized for multiprocessor computers. It is, however, unsuitable for severely disordered and twinned structures because it is based on the assumption that the structure consists of atoms.

          Related collections

          Most cited references19

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Crystal structure refinement with SHELXL

          Introduction   The first version of SHELX dates back to about 1970 and, after extensive testing, it was first released in 1976. Since then the program system has been developed continuously. The early history has been described by Sheldrick (2008 ▶). The present paper is intended to explain the philosophical and crystallographic background to developments between 2008 and 2015 in SHELXL, the program in the SHELX system responsible for crystal structure refinement. Although SHELXL may also be used for the refinement of macromolecular structures against high-resolution data, most of the new developments have concentrated on the refinement of chemical structures, such as those published in Section C of Acta Crystallographica. Readers not familiar with SHELX may find it useful to look at Sheldrick (2008 ▶) before reading this paper. A major change since 2008 is that the distribution is performed via the SHELX homepage (http://shelx.uni-ac.gwdg.de/SHELX/), which also provides a great deal of documentation, tutorials and other useful information. The programs are updated more frequently than in the past and the list of recent changes should be consulted regularly to see if it is necessary to download a new version. The homepage also contains a list of registered users (but not their email addresses); currently there are over 8000 spread over more than 80 countries. SHELX workshops are announced on the homepage, and many of the talks given at these workshops may be downloaded there. SHELXL is compiled with the Intel ifort FORTRAN compiler using the statically linked MKL library, and is available free to academics for the 32- or 64-bit Windows, 32- or 64-bit Linux and 64-bit Mac OS X operating systems. Multithreading is achieved using OpenMP along the lines suggested by Diederichs (2000 ▶), and the program is particularly suitable for multiple-core processors. SHELXL and CIF format   The importance of depositing crystallographic data   Although the IUCr journals have led the way in insisting that experimental crystallographic data should be deposited, several leading chemical journals still only require the deposition of a CIF (Hall et al., 1991 ▶) containing just the results of the crystal structure determination and not the X-ray or neutron reflection data used to determine the structure. In this respect, biological crystallographers are more advanced. The PDB (Protein Data Bank; Berman, 2008 ▶) has required the deposition of reflection data since February 2008 and virtually all journals that report biological crystal structures, including high-profile journals such as Nature and Science, require a PDB ID for the structure. This has already had a considerable impact. For example, it has led to the retraction of several structures in which the data do not support the claim that a particular ligand was bound to a protein. One very recent example of the use of such deposited data (Köpfer et al., 2014 ▶) can be mentioned here, since it involved the use of SHELXL to refine occupancies and obtain standard uncertainties for them. For over 50 years, the accepted model (Hodgkin & Keynes, 1955 ▶) for the potassium channel present in many living systems was that it involved the transport of both potassium ions and water molecules, based on the argument that adjacent binding sites could not be occupied by K+ cations because they would repel one another, and so the intermediate sites must be occupied by water molecules. Several protein crystal structures were refined at modest resolution with alternating potassium ions and water molecules in the channel and appeared to support this model. However, to the authors’ credit, they deposited their reflection data, including the Friedel pairs, although that was not then obligatory. When sophisticated molecular dynamics (MD) calculations showed that only a model with adjacent K+ cations could account, by a sort of knock-on effect, for the very high potassium permeability observed, it was necessary to reinvestigate the structure using the deposited X-ray reflection data. Both the occupancy refinements with SHELXL and the analysis of the anomalous data with SHELXD (Schneider & Sheldrick, 2002 ▶) and ANODE (Thorn & Sheldrick, 2011 ▶) showed conclusively that the four connected potassium sites are almost fully occupied, as predicted by the MD calculations. Archiving crystallographic data   To make the deposition and archiving of reflection data as simple as possible, the CIF written by SHELXL now includes the .hkl reflection data file, embedded as CIF text: _shelx_hkl_file ; ... reflection data in SHELX HKLF 2, 3, 4 or 5 format ... ; _shelx_hkl_checksum 12345 The checksum provides a check that the data have not been corrupted accidentally. The .res results file from the refinement and the .fab file (see below), if used in the refinement, are embedded into the CIF in the same way. The SHELX program SHREDCIF may be used to extract these files from the CIF archive and rename the .res file to .ins, for example to perform further refinements with SHELXL. The intention is that such CIFs containing embedded data should become standard for deposition and archiving. It is particularly convenient that only one file is needed. CIF identifiers beginning with _shelx_ are reserved for use by the SHELX programs, but of course other program authors may use a similar construction for embedding the reflection data etc. Users who do not wish to preserve their carefully measured data for posterity in this way have criticized the embedding of the reflection data on the grounds that (a) the resulting CIF is too large for submission with a paper for publication and that (b) certain CIF-processing programs take a long time to read such a CIF and may even choke in the attempt. However, it should be noted that (a) the figures submitted with a paper often involve larger files and (b) SHREDCIF can usually read and dismember such a CIF in less than one second! To generate a CIF without intensity data for other purposes, e.g. for input to a molecular graphics program, the keyword NOHKL may be used in the SHELXL ACTA instruction. It is difficult to understand why several leading chemical journals still only require the deposition of the atom co­ordinates, etc., but not the reflection data, especially now that the Cambridge Structural Database (CSD; Allen, 2002 ▶) accepts the new CIFs and strongly encourages deposition of the reflection data. A simple solution would be for journals to require a confirmation that the full data have been deposited with the CSD (Bruno & Groom, 2014 ▶) or COD (Gražulis et al., 2012 ▶), analogous to the way in which the PDB requires deposition of the structural and reflection data before issuing a PDB ID. Including CIF items at the end of the .hkl file   Since SHELX76, the reflection data have been read until a reflection with indices 0,0,0 or a blank line (or card) or the end of the file was encountered. The rest of the file was never read by the SHELX programs. This means that additional data specific to that data set, such as details of the data collection and processing, may conveniently be appended to the .hkl file, which is a much safer way of preserving them than putting them in a separate file. For example, the Bruker scaling program SADABS (Krause et al., 2015 ▶) now appends CIF format items such as those shown below to the .hkl file that it outputs: _exptl_absorpt_process_details ‘SADABS 2014/4’ _exptl_absorpt_correction_type multi-scan _exptl_absorpt_correction_T_max 0.7489 _exptl_absorpt_correction_T_min 0.7208 _exptl_special_details ; The following wavelength and cell were deduced by SADABS from the direction cosines etc. They are given here for emergency use only: CELL 0.71072 6.100 18.294 20.604 90.006 89.992 90.000 ; SHELXL uses the CIF items found at the end of the .hkl file to replace items to which it would otherwise have given the value ‘?’. It ignores all other items. So in this example, the first four CIF items find their way (left justified) into the output CIF, but although _exptl_special_details is legal for a CIF it is not included as a CIF item because this CIF identifier would not otherwise have been output. However, it is still included in the .cif file as part of the embedded .hkl file, so that the information is not lost. Unfortunately, because of a fundamental CIF design weakness (the same character ‘;’ is used for both the beginning and end of a text item; it would have been better to have used a different terminator such as ‘:’), SHELXL has to replace ‘;’ in this example by ‘)’ when embedding the .hkl file, and SHREDCIF repairs the damage by turning a leading ‘)’ in an otherwise blank line back to ‘;’. In this example, the cell following _exptl_special_details is not the same as in the CELL instruction used in the .ins file, because there is a reorientation matrix in the HKLF 4 instruction to transform the indices to the conventional P21212 setting for the space group. However, it is still useful to preserve it in case the .hkl file becomes orphaned. Refinement against neutron diffraction data and special facilities for H atoms   The new features in SHELXL for refinement against neutron data have been discussed recently by Gruene et al. (2014 ▶). If a NEUT instruction is placed before SFAC, neutron scattering factors are employed, and the default bond lengths to H or D atoms are lengthened to correspond to internuclear distances rather than the distances appropriate for refinement against X-ray data. Whereas for X-rays H and D are treated specially, for neutrons they are treated as normal atoms. The HFIX and AFIX instructions may still be used to generate starting positions for H and D atoms, but it is recommended to use geometric restraints rather than a riding model for their refinement against neutron data. This is particularly important when anti-bumping restraints are applied; they work much better for a restrained than for a riding-model refinement of the H and D atoms against neutron data. Chiral volume restraints for refinement against neutron data   The chiral volume restraint CHIV, which is often used for macromolecular refinement, is interpreted as follows if NEUT is set. If three atoms other than H or D are bonded to the atom in question, the H and D atoms are ignored and the CHIV restraint operates in the same way as for a refinement against X-ray data. If there are exactly three bonded atoms including H and D, the latter are used in the restraint. Thus, CHIV 0 N1 could be used to restrain a terminal –NH2 group to be planar, and CHIV with a nonzero target value could be used to make it nonplanar. Anisotropic refinement of H and D atoms against neutron data   Since the neutron scattering factors for H, and especially for D, are of a similar order of magnitude to those for other atoms, H and D also need to be refined anisotropically for refinement against neutron data. Unfortunately, this results in about twice as many parameters as for a standard refinement against X-ray data, and the number of data available may well be less than for an X-ray refinement, so further restraints such as the new RIGU rigid-bond restraint (Thorn et al., 2012 ▶) may be required. The RIGU restraints require that the relative motion of bonded atoms is at right angles to the bond joining them. This sets up three restraints per atom pair, one of which is equivalent to the classical rigid-bond restraint DELU. RIGU is very generally applicable and it is almost always safe to add a RIGU instruction without further parameters to the .ins file. The resulting displacement ellipsoid plots tend to appear chemically more reasonable than those from an unrestrained refinement and there is usually little change in the final R factors. The following example, using data from Lübben et al. (2014 ▶), is a little different, because it involves the anisotropic refinement of all atoms, including H atoms, using SHELXL against neutron diffraction data collected at 9 K. The .ins file was the same as that used for refinement against X-ray data, except that: (i) a NEUT instruction was placed before SFAC, so that neutron scattering lengths were used instead of X-ray scattering factors; (ii) instead of using a riding model for the refinement of the H atoms, SADI (equal distance) restraints were applied to the O—H bonds in the water molecule, the C—H bonds in the CH2 and CH3 groups, and the H⋯H distances within the CH3 group; and (iii) a much larger value was obtained for the extinction parameter (EXTI). Close inspection of the atomic displacement ellipsoids in Fig. 1 ▶(a) shows that the assumption that the relative motion of the H atoms is at right angles to the bonds holds well, even for the unrestrained refinement. The refinement with tight RIGU restraints (RIGU 0.0001) for the bonded atoms (Fig. 1 ▶ b) looks very similar, but the H-atom displacement ellipsoids are aligned so that their smallest principal axes are even closer to the bond directions, as required when the motion is at right angles to the bonds. However, Fig. 1 ▶(b) also reveals a small weakness of the rigid-bond assumption: the H-atom displace­ment ellipsoids appear to be slightly squashed in the direction of the bond. This is probably because the amplitude of the zero-point motion along the bond is larger for the H atom than for the atom to which it is bonded, because of the smaller mass of the former, but the rigid-bond restraint tries to make them equal. As a result, the R 1 value is slightly higher for the RIGU-restrained refinement (0.0342 rather than 0.0304). This effect is only observable here because of the extremely low temperature (9 K) and the high-quality data; at higher temperatures, the RIGU restraints can be very useful to stabilize the anisotropic refinement of H and other atoms against neutron data. Fig. 1 ▶ also exhibits much larger atomic displacement ellipsoids for the H atoms than for the remaining atoms. At such low temperatures, the frequently made assumption that the isotropic displacement parameters of the atoms can be set to 1.2 or 1.5 times the equivalent isotropic U values of the atoms to which they are bonded is clearly not justified. However, at temperatures above about 100 K it has been shown that this assumption is less seriously flawed (Lübben et al., 2014 ▶). Capelli et al. (2014 ▶) recently showed that Hirshfeld atom refinement provides a much more accurate way of deriving anisotropic displacement parameters for H atoms from X-ray data. Other new facilities for H atoms and CF3 groups   Except where the NEUT instruction is used, both H and D are now treated as special in the input syntax. This is useful when both are present, e.g. when the crystals came from an NMR tube containing a deuterated solvent. The AFIX instructions for CH3 groups may now also be used to set up CF3 groups, but it is better to refine these as rigid groups or with distance restraints (DFIX or SADI) than by applying a riding model, because the latter can be unstable. An HTAB instruction without any parameters instructs the program to find possible hydrogen bonds. These now include C—H⋯O interactions when the C atom is directly or in­directly attached to an electronegative atom (Taylor & Kennard, 1982 ▶). Such weak interactions involving H atoms attached to peptide Cα atoms are common in protein structures (Desiraju & Steiner, 1999 ▶). The resulting full HTAB and EQIV instructions are appended after the END instruction of the .res file and need to be (selectively) transferred to the beginning of the .ins file, so that they will be included in the CIF generated by the next refinement. This facilitates the generation of tables of hydrogen bonds, and helps to prevent hydrogen bonds involving symmetry-equivalent atoms from being overlooked. Absolute structure determination   In the distant past, it was often assumed that it was necessary to include a heavy atom, e.g. by making a rubidium salt or bromobenzoate derivative, in order to obtain a reliable absolute structure, for instance to establish which enantiomer of a chiral molecule was correct. Since then, experimental and computational methods have made such progress that the absolute structure can often even be determined with Mo Kα radiation when the heaviest atom is oxygen (Escudero-Adán et al., 2014 ▶). When the 2008 SHELX paper was written, the method of choice to determine the absolute structure was to refine the Flack parameter (Flack, 1983 ▶) as one of the parameters in a full-matrix refinement. Since then it has become clear that this led to a substantial overestimation of the standard uncertainty of the Flack parameter, and that post-refinement methods using either a Bayesian approach (Hooft et al., 2008 ▶) or quotients or differences of the Friedel opposites as observations (Parsons et al., 2013 ▶) give more reasonable estimates of the Flack parameter, and especially its standard uncertainty. This led to the IUCr/checkCIF requirement that Friedel opposites should not be merged in the deposited data. For small-molecule refinements with SHELXL, the input .hkl file should contain the unmerged data. This enables the program to produce a more complete output CIF and to estimate the Flack parameter using the Parsons quotient method for all noncentrosymmetric structures. This approach works well even for twinned structures. For structure refinement, the reflections are, by default, merged according to the point group of the crystal structure (MERG 2 in SHELXL notation). In the relatively rare cases that result in an intermediate value of the Flack parameter with a small standard uncertainty, in order to obtain the most accurate calculated intensities and hence difference density, it is still necessary to refine the Flack parameter by the full-matrix method (TWIN/BASF). However, a Flack parameter of 0.5 with a small standard uncertainty is a warning sign that the true space group might be centrosymmetric! Estimates of standard uncertainties   One side effect of the inclusion of Friedel opposites is that there will be nearly twice as many data for the refinement of a noncentrosymmetric structure, which, using the usual least-squares algebra, would lead to a reduction in the estimated standard uncertainties of all parameters by a factor of nearly 21/2. SHELXL now uses the number of unique reflections as defined by the Laue group, rather than the number of observations, in the formula used to estimate the standard uncertainties (Spek, 2012 ▶). It could be argued that all reflection intensities are independent measurements, and this was approximately true for unscaled data from point detectors before the introduction of focusing optics. However, it is now standard practice to scale the data so that equivalent reflections (usually including Friedel opposites) become more equal, in order to correct for absorption and differences in the effective crystal volume irradiated, and then the equivalent reflections can no longer be regarded as independent observations. In some cases, this change may result in a modest increase in the estimated standard uncertainties, but these were generally underestimated anyway (Taylor & Kennard, 1986 ▶). The new method of estimating standard uncertainties also applies to twinned structures, where some SHELXL97 users were required by referees to throw away some of their carefully measured data so that the number of observations would be equal to the number of unique reflections. Now all the experimental data may be used and the estimated standard uncertainties should be more realistic. With SHELXL97, it was necessary to use the third least-squares parameter to correct the estimated standard uncertainties; this is not required anymore (except for ‘SQUEEZEd’ structures). Input of partial structure factors   The new ABIN instruction was primarily designed to facilitate the use of the SQUEEZE facility (Spek, 2015 ▶) in the program PLATON (Spek, 2009 ▶), but it can also be used to input a bulk solvent model for a macromolecule. PLATON calculates the partial structure factors corresponding to a blob of un­modelled difference density and writes them to the .fab file. The ABIN instruction causes h, k, l, A and B to be read from the .fab file, where A and B are the real and imaginary components, respectively, of a partial structure factor. These reflections are read in free format (one reflection per line) and may be in any order. Duplicates, systematic absences and reflections outside the resolution limits for refinement are ignored. Symmetry equivalents are generated automatically. At least one symmetry equivalent (according to the point group) of each reflection present in the .hkl file, including all reflections in all twin components if the structure is twinned, should be present in the .fab file. For twinned structures, it is necessary first to use the new LIST 8 instruction (see below) to generate detwinned data for input to PLATON. The A and B values refer to the untwinned structure, but in the case of a twinned structure, after applying the appropriate symmetry trans­formations, they are added to the calculated structure factors for all twin components. ABIN takes two free variable numbers (Sheldrick, 2008 ▶) n 1 and n 2 as parameters. The A and B values read from the .fab file are multiplied by kexp[−8π2 Usin2θ/λ2], where k is the value of free variable n 1 and U is the value of free variable n 2. These two optional parameters may be needed when the partial structure factors come from a bulk solvent model of a macromolecule, but are probably not needed for use with SQUEEZE. SQUEEZE should only be used where it is not possible to model the disordered solvent by normal methods, e.g. when there is a continuous ribbon of diffuse difference density along one of the unit-cell axes. Partial structure factors and ABIN should always be used in preference to the old procedure of modifying the input .hkl file, which made it impossible to remodel the disordered density should a better method become available. Extending the PART number concept   The use of PART numbers, introduced in SHELXL93, has proved invaluable in the refinement of disordered structures. Two atoms are considered to be bonded if they have the same PART number or if one of them is in PART 0. The resulting connectivity table is used for the generation of H atoms (HFIX and AFIX), for setting up restraints such as DELU, SIMU, RIGU, CHIV, BUMP and SAME, and for generating tables of geometric parameters (BOND, CONF, HTAB). Usually, most of the atoms are in PART 0, but, for example, a molecule or side chain dis­ordered over three positions could use PART 1, PART 2 and PART 3. If the PART number is negative, bonds are not generated to symmetry-equivalent atoms. It should be noted that positive PART numbers 1, 2, 3 etc. correspond to the alternative location indicators A, B, C etc. in PDB format. However, this notation is difficult to use when there is a disorder within a disorder. A BIND instruction that specifies two numbers may now be used to get around this problem. For example, BIND 2 4 means that, in addition to the usual PART rules, atoms in PART 2 may also bond to atoms in PART 4. Negative PART numbers are allowed in the BIND instruction. As an example, consider an n-butyl substituent coordinated through atom C1 that splits into two disorder components at C2. Atom C1 is then in PART 0, C2A, C3A and C4A in PART 1, and C2B, C3B and C4B in PART 2. Atom C1 is bonded to both C2A and C2B but, because these two atoms have different PART numbers, H atoms will be generated correctly using the HFIX instruction. However, if there is a further disorder starting at atom C3B, this cannot be handled easily by SHELX97. Atoms C3B and C4B can be split into C3B′ and C4B′ in PART 3 and into C3B′′ and C4B′′ in PART 4, but then atoms C3B′ and C3B′′ are not bonded to C2B because they have different nonzero PART numbers. Extra bonds could have been inserted into the connectivity table with: BIND C2B C3B’ BIND C2B C3B” but then HFIX or AFIX would still not generate the correct H atoms, because they need to refer to the PART numbers of the neighbouring atoms too. However, the alternative BIND 2 3 BIND 2 4 now enables the H atoms to be generated correctly. Since SHELXL allows atoms to have the same names if they have different PART numbers, atoms C3A, C3B′ and C3B′′ could all be labelled C3 in this example. This would simplify the naming of the H atoms, but might confuse non-SHELX programs that read the .res file. As with almost every disorder, the use of RIGU is strongly recommended here. Other new features in SHELXL   One of the most common cases of instability in crystal structure refinements is when the atomic displacement parameters refine to appreciably negative values. The new XNPD instruction may be used to combat this. When an isotropic displace­ment parameter, or a principal component of an anisotropic displacement parameter, refines to a value less than (e.g. more negative) the value specified with the XNPD instruction, it is replaced by that value, and the displacement parameters U iso or U ij are recalculated. Thus, the default setting of XNPD -0.001 avoids the risk of the refinement becoming unstable, but still leads to nonpositive definite (NPD) atoms being recognized and reported. For problematic cases, it may be desirable to set XNPD to a small positive value. However, it should then first be checked that the negative value was not caused by an error in the input file, e.g. an incorrect element type or site-occupation factor. The new LIST 8 option writes h, k, l, F o 2, σ(F o 2), F c 2, ϕ (phase angle in degrees), d spacing in ångström (Å) and 1/(w 1/2) in CIF format to the .fcf file, where w is the weight derived from the weighting scheme and used in the refinement. For weak reflections, 1/(w 1/2) should be only a little larger than σ(F o 2). This list is on an absolute scale and is detwinned, merged (according to the point group of the crystal structure) and sorted, but without eliminating the anomalous contributions (except in the calculation of ϕ, so that the corresponding electron density is real). This option is essential for applying the SQUEEZE option in PLATON to twinned structures, but also has other uses. RTAB D2CG followed by atom names may be used to calculate the distance between the first named atom and the unweighted centroid of the remaining atoms, together with its standard uncertainty. This can be used to calculate distances to ring centroids, for example. As in SHELXL97, ‘+filename’ may be used to insert further instructions whilst reading the .ins file. These instructions are not echoed to the .res file. The new ‘++filename’ may be used to insert instructions that should be echoed to the .res file. The ‘+filename’ instruction itself is echoed to the .res file but ‘++filename’ is not. These instructions are useful for reading in long lists of restraints, etc. Although the SAME instruction for generating distance restraints is very convenient, especially when combined with the use of residues (RESI) so that the same atom names may be used when there are several chemically identical solvent molecules, it is less convenient when some of those solvent molecules are disordered, for example, tetrahydrofuran (THF), with one atom either above or below the plane of the other four. A SADI instruction with no parameters now causes SADI (similar distance) restraints to be generated from all the SAME instructions. These appear after the END instruction in the .res file. They can be moved to the start of the new .ins file, and edited and extended to give fine control over the refinement of such disorders. The TWIN instruction no longer requires integer matrix elements. The matrix is used to generate the indices of the reflections of the twin components, and if they differ by more than 0.1 from integers they are ignored. This enables the refinement of rhombohedral obverse/reverse twins, and is also useful for pseudomerohedral twins in which some of the reflections of a minor twin domain overlap nearly perfectly with reflections of the major domain and have to be taken into account, and other reflections of the minor domain do not overlap and can be ignored. If the twin components are more equal, the HKLF 5 format reflection data may be a better approach. Details of further changes since 2008 may be found on the SHELX homepage (http://shelx.uni-ac.gwdg.de/SHELX/). Conclusions   This account of changes and extensions to SHELXL since 2008 is testimony to the continuous development of the structure refinement techniques that is still taking place. In that time, CIF has advanced to become the standard for the deposition and archiving of crystallographic data, and this is reflected in many of the changes in SHELXL. The .ins and .hkl files used for input to SHELXL have remained, with very minor exceptions for which there were good reasons, upwards compatible since SHELX76. Another reason why SHELX has remained popular over many generations of computer hardware is its strict ‘no dependencies’ philosophy: no external programs, libraries (such as DLLs) or environment variables are required to run any of the SHELX programs (except SHELXLE).
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A short history of SHELX

            An account is given of the development of the SHELX system of computer programs from SHELX -76 to the present day. In addition to identifying useful innovations that have come into general use through their implementation in SHELX , a critical analysis is presented of the less-successful features, missed opportunities and desirable improvements for future releases of the software. An attempt is made to understand how a program originally designed for photographic intensity data, punched cards and computers over 10000 times slower than an average modern personal computer has managed to survive for so long. SHELXL is the most widely used program for small-molecule refinement and SHELXS and SHELXD are often employed for structure solution despite the availability of objectively superior programs. SHELXL also finds a niche for the refinement of macromolecules against high-resolution or twinned data; SHELXPRO acts as an interface for macromolecular applications. SHELXC , SHELXD and SHELXE are proving useful for the experimental phasing of macromolecules, especially because they are fast and robust and so are often employed in pipelines for high-throughput phasing. This paper could serve as a general literature citation when one or more of the open-source SHELX programs (and the Bruker AXS version SHELXTL ) are employed in the course of a crystal-structure determination.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Structure validation in chemical crystallography

              1. Introduction In the late 1960s, only 40 years ago, a routine small-molecule crystal structure determination in the setting of a well equipped crystallography laboratory would take several months. The bottlenecks were the data-collection, structure-solution and structure-refinement stages. Since then, data collection has advanced from a time-consuming film-based and serial detector-based technique to the current area detector-based systems, thus speeding up this stage by at least an order of magnitude. Modern CCD detector-based systems can easily collect 1000 small-molecule data sets in a year. The currently available direct methods for structure solution have essentially solved the long-standing phase problem in small-molecule crystallography given crystals of sufficient quality. Easy-to-use structure-determination software is now widely available and often comes with the data-collection hardware. The computing power needed for data processing, structure solution and refinement, once expensive and a monopoly of the University Computer Centre, is nowadays ubiquitous, cheap and fast on the personal computer platform. Therefore, given a routine structure determination, it is now quite possible to collect diffraction data, solve and refine the structure and send off a structure report for publication in Acta Crystallographica Section E within a day. This development is clearly demonstrated by the growth in the number of small-molecule structures that are published each year. This number has increased exponentially over the past 40 years from about 1000 in 1967 to over 35 000 in 2007. It should be noted that this last figure is a lower bound of the actual number of small-molecule structure determinations that are carried out each year. It is likely that a similar number of studies never reach the literature. The publication of a crystal structure as part of a research paper is still a time-consuming activity and remains a bottleneck, often together with the problems of obtaining publication-quality crystals. Nowadays, the majority of small-molecule crystal structures are determined to ‘confirm’ the outcome of synthetic chemical work. The confirmation of a newly prepared compound by a crystal structure is generally a requirement for the publication of the associated chemistry in major chemical journals. Seeing is believing. Crystallography is in this sense often used as an analytical tool. However, there is a problem. The number of experienced crystallographers dedicated to single-crystal studies has certainly not increased in proportion to the number of reported studies. Many single-crystal structure analyses are currently carried out by non-experts using the available black-box software. Often, for understandable reasons, such investigators lack sufficient experience to avoid the many possible pitfalls, such as an incorrect atom-type assignment, that may be obvious to an expert. In the past, all unusual aspects of a structure analysis were supposed to be discussed in a publication with sufficient detail for both the reader and referee to make their own judgment about a claimed result. Nowadays, crystallography is considered by many chemical journals as routine and the crystallographic information is, at best, supplied in a footnote or as supplementary material with very limited details, if any, given in the published text. The chances are therefore high that papers are accepted for publication without crystallographic referees ever having looked at the supporting material. Unfortunately, the number of experienced crystallographic referees has decreased dramatically. As a result, the literature and databases, such as the Cambridge Structural Database (CSD; Allen, 2002 ▶), include obviously incorrect structures associated with formally refereed papers. About 12 years ago (Linden, 2007 ▶), a crystal structure-validation project was started in the context of the journals of the International Union of Crystallography in order to address the refereeing issue and the time-consuming work that went into the checking of the supplied data for completeness and consistency. Its initial implementation was used to evaluate papers submitted to Acta Crystallographica Section C. At that time, it was already a requirement of the journal that the crystallo­graphic data had to be provided in the computer-readable CIF format (Hall et al., 1991 ▶). The submission of electronic data files allowed the validation software to per­form a number of quality and validity checks and to create a report in the form of ALERTS on issues to be addressed by authors and referees. Soon afterwards, further validation tests on structural issues were added. These tests are incorporated as part of the structure-analysis tools that are available in the PLATON package (Spek, 2003 ▶; Müller et al., 2006 ▶). The official IUCr structure-validation suite (checkCIF/PLATON) is currently available as an IUCr web service (http://journals.iucr.org/services/cif/checking/checkfull.html). Its use is required for every small-molecule crystal structure submitted for publication in the IUCr journals. Many major journals currently have similar requirements, as stated in their Notes for Authors. This paper reports on the current status of the IUCr validation project. 2. Structure validation Structure validation addresses three simple but important questions: (i) Is the reported information complete? (ii) What is the quality of the analysis? (iii) Is the structure correct? The answer to the first question involves the use of a computerized checklist. The answers to the other questions are obviously less straightforward. The quality of a single-crystal study can be classified into one of four classes. Class I consists of high-quality structure determinations that were carried out using data collected from a near-perfect crystal and under optimal experimental conditions. This will generally be data collection at a sufficiently low temperature and to a sufficiently high resolution. Such conditions are not always attainable. Inherently poor-quality crystals, disorder or a phase transition can be reasons why this goal cannot be reached. Class II structures are good structures that were determined under routine conditions or with experimental restrictions that are sufficient for the purpose of their study but not necessarily to the highest attainable quality. This class includes structures from data collected at room temperature or with high-pressure cells. Class III structures are poor structures that are essentially correct as far as the associated chemistry is concerned but for various reasons have limited accuracy. Reasons can be poor crystals, incomplete or weak and noisy diffraction data. Severe disorder that is difficult to model can be another reason. Class IV structures are incorrect. Important examples are those in which some of the element-type assignments are wrong or models with too few or too many H atoms. The impact of an incorrect published structure may be disastrous for research that builds on it. Examples include attempts to synthesize complex natural products on the basis of an incorrectly reported crystal structure (for an example, see Li, Burgett et al., 2001 ▶; Li, Jeong et al., 2001 ▶). Ideally, most issues reported by the validation software should already have been corrected at an early stage of the analysis and thus should never appear in published structures. Correction at the publication stage may be laborious or even impossible for unique crystalline samples. Clearly, structure validation is particularly important for addressing Class IV structures. Class III structures may be useful to direct further research, but are generally not suitable for publication unless supported by an in-depth analysis. Crystallographic journals will aim at Class I structures, while noncrystallographic referees of chemical journals may even be satisfied with Class III structures. Validation should avoid having Class IV structures ever appear in print. The holy grail of structure validation is a tool that unequivocally assigns one of the above four quality classes to a given structure report. This would be performed on the basis of the application of objective criteria to the supplied structural and experimental data. The currently available IUCr tool, checkCIF/PLATON, is in this sense still far from that ideal. Instead, a list of ALERTS is produced that are classified according to their level of seriousness. These should be addressed by the investigator and those remaining evaluated by experts. The validation criteria currently in use are in many cases empirical and based on experience and tradition rather than based on science. Some criteria have changed over time. There is an obvious trade-off between being too critical, leading to too many false ALERTS, and being less sensitive and thus missing multiple weak indications of a serious problem. Eventually, a scientifically sound underpinning of the validation criteria will be sought. Automated structure validation as it is today has its origin in the definition of the CIF standard for the exchange and archival of structural and experimental data (Hall et al., 1991 ▶). CIF became ‘the standard’ in small-molecule crystallography with its adoption by the widely used SHELXL refinement-software package (Sheldrick, 2008 ▶). Acta Crystallographica Section C made CIF the required data-submission format for publication and it is currently the only way to submit a structural report to Acta Crystallographica Sections C and E. Initially, software was developed to check the completeness of the supplied data, its consistency and its validity. It was soon realised that the availability of coordinate data also made it possible to base geometry and other calculations on these data. Examples are the detection of solvent-accessible voids in a structure that were missed by the investigators and the search for missed higher symmetry. This can be achieved by the use of readily available tools in the PLATON package (Spek, 2003 ▶). Validation issues are subdivided into four categories: (i) Missing or inconsistent data. (ii) Indicators that the structure model may be wrong or deficient. (iii) Indicators that the quality of the results of the study may be low. (iv) Cosmetic improvements, queries and suggestions. The validation software assigns one of four severity levels (A, B, C and G) to reported issues. Level A ALERTS usually indicate that corrective action is imperative or there has to be a scientifically acceptable explanation for the case at hand. Level G ALERTS concern issues that may be correct but should be checked. They can still point to serious problems that could not be analyzed in detail on the basis of the available data. Currently, about 400 validation tests have been implemented. Most tests result in a one-line ALERT message. Each test is associated with some documentation explaining the problem with possible options to address them. 3. Validation of the diffraction data Most problems with and questions related to a structure report can be resolved just using the data available in the CIF. However, reflection data in computer-readable format will sometimes be needed in borderline cases for a detailed analysis of issues such as the correct symmetry description. Some problems, such as missed or ignored twinning as an explanation for an unsatisfactory refinement result, may only show up in an analysis of the reflection data. The submission of reflection data as a structure-factor file (F o/F c data in CIF format) is required for a structural publication in Acta Crystallographica. This allows automatic checking for missed twinning. Absolute structure assignments are generally inferred from the value of the Flack parameter that is reported in the CIF (Flack, 1983 ▶). This value can be erroneous (Flack et al., 2006 ▶) and lead to false conclusions about enantiopurity. The availability of the reflection file allows software to check the reported value independently. This is performed by a com­parison of the value of the reported Flack parameter with the value of the Hooft parameter (Hooft et al., 2008 ▶), which is calculated from the Bijvoet differences. The availability of reflection data also allows an independent structure determination and inspection of difference density Fourier maps for special features such as missing or incorrectly positioned H atoms. Unfortunately, the referees of chemical journals have no easy access to the reflection data since there is no deposition requirement by non-IUCr journals. Consequently, those primary data are also not archived. The Cambridge Structural database does not archive reflection data either. The validation of F o/F c data is available with the standalone PLATON/VALIDATION software (http://www.cryst.chem.uu.nl), and will be available shortly through the IUCr checkCIF/PLATON web service. Validation utilizing the reflection data is currently implemented for papers submitted to Acta Crystallographica Sections C and E. 4. Examples This section reviews a number of published structure reports that have been shown to be erroneous and for which a formal correction has appeared in the literature. There are many more (largely undocumented) examples of troublesome reports. Any analysis of the data for a subset of structures taken from the nearly 500 000 structures in the CSD will show outliers. Most of these outliers point under close inspection to unresolved problems or errors of some sort rather than being of scientific interest. Unfortunately, in most cases the primary data (reflection data) are unavailable for a proper objective and definitive analysis. 4.1. Missed symmetry The assignment of the correct space group of a structure to one of the possible 230 space groups can at times be problem­atic. The effective space group cannot always be assigned uniquely at the start of the structure analysis on the basis of the observed systematic absences alone. Often, preliminary structure solution only succeeds in a space group that turns out to be a subgroup of the real one. In fact, difficult structures can often only be solved in the lowest symmetry space group P1, leaving the transformation to the correct space group to be performed afterwards. Unfortunately, many examples in the literature (see Marsh & Spek, 2001 ▶) show that this goal is not always achieved. The required transformation is not always trivial. Software that suggests the real symmetry and performs the associated transformation is readily available (e.g. PLATON/ADDSYM), but is not always part of the refinement software suite being used. Some missed symmetry cases are relatively harmless in that this error does not seriously affect the structure and its interpretation (e.g. wrong Laue group), such as Example 1 below. On the other hand, overlooking an inversion centre is generally serious. This last problem can be hidden when structure refinement is performed by using constraints and restraints to secure the stability of the least-squares refinement. There are many borderline cases for which the reflection data are needed for a definitive space-group assignment. 4.1.1. Missed symmetry: Example 1 Fig. 1 ▶ illustrates an example of a structure that was published with one crystallo­graphically independent molecule in the orthorhombic space group Pbca (Azumaya et al., 1995 ▶). A program that displays a structure perpendicular to the main molecular plane by default will immediately show that this molecule has at least pseudo-threefold axial symmetry. Such an axis may or may not coincide with a crystallographic axis. The existence of crystallographic threefold symmetry was shown to be the case by Herbstein (1999 ▶). The correct cubic space-group assignment, Pa , would have been indicated by the current validation software. 4.1.2. Missed symmetry: Example 2 Fig. 2 ▶(a) illustrates the dramatic effect of the solution and erroneous refinement of a centrosymmetric structure in a noncentrosymmetric space group (Kahn et al., 2000a ▶). Even just the published displacement ellipsoid plot of this structure, which has been refined in space group P1, should have aroused serious suspicion with the referees of the paper about the quality and correctness of the structure. This structure would have been a perfect candidate for the ‘ORTEP of the Year’ award (Harlow, 1996 ▶). It was only on the basis of a suggestion from a reader of the journal that this structure was re-refined in the centro­symmetric space group P . The correctly refined structure, shown in Fig. 2 ▶(b), clearly looks quite normal (Kahn et al., 2000b ▶). Thus, what might have looked like a structure report based on very poor data turned out to be a good-quality structure after all. In this context, it is interesting that the detailed discussions in the original paper about the unusual differences in bond distances turned out in hindsight to be based on incorrectly interpreted refinement artifacts. The checkCIF/PLATON validation report (using the downloadable CIF) for the original P1 structure cites the space-group problem and numerous other issues. 4.2. Missing or incorrectly placed H atoms Missing H atoms or too many H atoms in a reported molecular structure may have a significant impact on the interpretation of the chemistry or the nature of the compound. H atoms are often introduced to the model at calculated positions without checking whether there is significant electron density at that location or are erroneously left out. Hydroxyl moieties generally have their H atom on a cone and pointing to a hydrogen-bond acceptor in the structure. Exceptions are rare and are generally the consequence of misplaced H-atom positioning, incomplete structures or wrong atom-type assignment. 4.2.1. Missing H atoms Fig. 3 ▶ shows a structure that was published as a synthetic breakthrough with the title The stable pentacyclopentadienyl cation (Lambert et al., 2002 ▶). Interesting chemistry building upon this result was envisioned. ‘Packing effects’ were offered as an explanation for the unusual nonplanarity of two substituents on the five-membered ring. It was rapidly shown by Otto et al. (2002 ▶) that the reported structure obviously needed two additional H atoms at sp 3 positions on the five-membered ring and that the reported structure was actually the less interesting pentamethylcyclopentenyl cation. Given the availability of reflection data, it was easy to verify the presence of the two additional H atoms in a difference density map. 4.2.2. Wrongly placed H atom Fig. 4 ▶(a) shows a structure with an incorrectly positioned hydroxyl H atom (Körner et al., 2000a ▶). The problem cannot be seen in a published single-molecule ORTEP illustration. What is needed is an analysis of the intermolecular interactions. Fig. 4 ▶(b) illustrates the problem that was detected in a retrospective validation run. The correct hydrogen-bond network shown in Fig. 4 ▶(c) makes more sense (Körner et al., 2000b ▶). Contoured difference electron-density maps can be very helpful in analyzing this type of problem. A misplaced H atom will show up as a negative density peak in its false location and the correct location will appear as a positive peak. 4.3. Incorrect atom-type assignments The result of a crystal structure determination is not always the expected one. In such cases, atom-type assignments may be biased by preconceived ideas and assumptions. Linden (2007 ▶) reports several cases in which the reported chemical species is nearly certain to be wrong. Structures published as possessing —C=N—H groups may sometimes have resulted from a misinterpretation of —C=O groups. Zhong et al. (2007 ▶, 2008 ▶) report the retraction of a coordination complex with a missing H atom on an N atom and a central SnIV atom that is most likely the cation of a lanthanide(III) coordination complex. Below are two further examples in which the reported chemistry was incorrect. 4.3.1. Withdrawn misinterpreted structure Fig. 5 ▶ is an example of a structure report (Fang et al., 2007 ▶) on a ‘novel heterocyclic’ compound, crystals of which were obviously obtained unexpectedly from a reaction mixture. A reader (an Acta Crystallographica Section C Co-editor) recognized this structure as being at least isomorphous with the well known structure of the mineral borax. Closer inspection revealed that the two compounds were indeed identical. The displacement ellipsoids of the N and C atoms clearly suggested that they should be interpreted as the atom types O and B, respectively. Hirshfeld (1976 ▶) rigid-bond test ALERTS sent out similar signals. The structure report was subsequently retracted (Fang et al., 2008 ▶). 4.3.2. Charge-balance problem Fig. 6 ▶ shows a published network structure (Sadiq-ur-Rehman et al., 2007 ▶) that was obtained unexpectedly. It is not clear from the reaction conditions where the NO3 − anion in the proposed structure is supposed to come from. In addition, there is also a charge-balance problem that was obviously overlooked by both the authors and the referees of the paper. An anion with a −2 charge is needed. The same authors (Sadiq-ur-Rehman et al., 2008 ▶) have now corrected the structure in view of the charge-balance problem. The NO3 − anion was replaced by CO3 2−, as suggested by the unusual size of the displacement ellipsoid of N in the NO3 − version. Generally, such a change of atom type would result in significantly better displacement parameters and refinement results. In this case, no significant improvement was observed. Interestingly, the revised report also does not mention that the reflection data were from a merohedrally twinned crystal. Part of the reason for this might be that the current CIF file definition (and for that reason software such as SHELXL) does not yet offer a standard means of recording twinning in a CIF. The twinning correction that was correctly applied was detected as part of the validation of the reflection file. On the other hand, the general implementation of a check for charge balance is a challenging validation issue. 5. Evaluation and discussion An analysis of the ALERTS generated for the 35 760 entries added to the CSD from 2006 and early 2007 indicates that validation and the provision of adequate responses to the issues raised still has room for improvement. 384 space-group changes were indicated. Other frequently reported problems are unaccounted-for solvent-accessible voids and numerous problems with H atoms. Some ALERTS require an in-depth analysis by experts. Investigators not trained in crystallography may have no clue as to what to do with ALERTS about symmetry issues, as may be gleaned from queries such as ‘What does it mean: space group incorrect’. A recent example of a structure with a space-group-related ALERT is the structure report of a small organic molecule that is correctly reported by Portilla et al. (2008 ▶) in space group P (Fig. 7 ▶). Validation suggests space group C2/m within default error tolerances as a higher symmetry alternative, which makes sense since the basic molecule has an approximate mirror plane. In fact, this structure easily solves and refines in C2/m when instructed to do so, although with a higher R factor. The evidence against C2/m is that the atomic displacement parameters in the t-butyl moiety are high. In addition, the proposed transformation from triclinic to monoclinic symmetry leads to α and γ angles that differ by 0.3° from the 90° required for monoclinic symmetry. The published structure is based on 120 K data and may well have exact C2/m symmetry at higher temperature. The Hirshfeld rigid-bond test (Hirshfeld, 1976 ▶) has proved to be very effective in revealing problems in a structure. It is assumed in this test that two bonded atoms vibrate along the bond with approximately equal amplitude. Significant differences, i.e. those which deviate by more than a few standard uncertainties from zero, need close examination. Notorious exceptions are metal-to-carbonyl bonds, which generally show much larger differences (Braga & Koetzle, 1988 ▶). 6. What next? Crystallographic procedures evolve. This also has an impact on structure-validation procedures. A number of currently implemented validation issues are related to data-collection techniques that are based on serial detectors. Those detectors have now largely been superseded by image-plate or CCD-based instruments, which may themselves become obsolete with the arrival of a new generation of (pixel) detectors that allow shutterless data collection. Before the introduction of two-dimensional detectors, corrections for absorption were performed using a multitude of techniques that ranged from purely empirical to an exact calculation based on a description of the crystal shape. Tests were implemented to validate the appropriate use of the chosen method. Nowadays, with two-dimensional detector data, a correction for absorption is mostly of the multi-scan type (e.g. SADABS; Sheldrick, 2008 ▶) convoluted with inter-image scaling and optionally preceded by a numerical correction for absorption on the basis of a description of the crystal shape. New up-to-date validation tests for this are needed. Current validation does not yet validate the results of powder diffraction, incommensurate structures and charge-density studies. The same applies to the more involved issues with inorganic compounds. The geometry of a newly determined structure can be validated against similar structures in the CSD (Allen, 2002 ▶; Bruno et al., 2004 ▶). This is easily performed manually but is not easy to automate. An interesting development is the arrival on the market of automated bench-top ‘crystal-to-structure’ instruments. This might pose an interesting challenge to journals and validation software when structure reports from such machines run in black-box mode arrive on editors’ desks. Formal crystallographic training has disappeared in many places, so inexperienced authors might be confronted with difficult to answer ALERT queries. Regular crystallographic training courses are still organized on a national or international basis and should be strongly supported. 7. Concluding remarks Structure validation has become a standard procedure in small-molecule crystallography. It sets a quality standard that is not just based on low final R factors and can save a lot of time for both the investigator and the referees of a paper. A short or zero-length list of minor ALERTS may indicate a good structure. Some ALERTS may even point to interesting structural features that would otherwise have gone unnoticed and are worth discussing in a publication. Examples are pseudo-symmetry and short intermolecular contacts. Some ALERTS reveal issues that can only be addressed by experienced crystallographers. An example is whether a given structure is best described as disordered in a centrosymmetric space group or as ordered in a noncentrosymmetric space group (Flack et al., 2006 ▶). The scope of the currently implemented checkCIF/PLATON validation procedures is high-resolution small-molecule crystal structures. Extension to large or low-resolution protein structures is not envisioned. As an example, the PLATON/ADDSYM algorithm that is used to detect missing symmetry requires atomic resolution data. The automated structure-validation techniques that are currently applied to submissions to Acta Crystallographica have essentially eliminated long-standing errors, such as missed higher symmetry, in Acta Crystallographica Sections B, C and E. This is unfortunately not yet the case for many other journals. Class IV structures still appear in the chemical literature. Structures are still published in a too low-symmetry space group despite the many papers on this issue by Dick Marsh entitled ‘More space group changes’ (see, for example, Marsh & Herbstein, 1988 ▶). Most major journals state structure validation as a requirement in their Notes for Authors. However, in practice it appears that many structures are published without serious inspection of the crystallographic data by an expert. An often-heard comment is ‘addressing crystallographic details holds up the publication of important chemistry’. In many cases, these crystallographic details are just trivial pieces of information that should already have been included as a standard protocol in the CIF at the end of the structure analysis. Database services, such as the Cambridge Crystallographic Data Centre (CCDC; Allen, 2002 ▶), attempt to sort out some of the obvious problems by consultation with the authors, but the CCDC staff cannot add any judgment or correction without the consent of the authors.
                Bookmark

                Author and article information

                Journal
                Acta Crystallogr A Found Adv
                Acta Crystallogr A Found Adv
                Acta Cryst. A
                Acta Crystallographica. Section A, Foundations and Advances
                International Union of Crystallography
                2053-2733
                01 January 2015
                01 January 2015
                01 January 2015
                : 71
                : Pt 1 ( publisher-idID: a150100 )
                : 3-8
                Affiliations
                [a ]Department of Structural Chemistry, Georg-August Universität Göttingen , Tammannstrasse 4, Göttingen, 37077, Germany
                Author notes
                Article
                sc5086 ACSAD7 S2053273314026370
                10.1107/S2053273314026370
                4283466
                25537383
                762384b6-6c94-4f0e-85c8-7162e51ad666
                © George M. Sheldrick 2015

                This is an open-access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

                History
                : 12 November 2014
                : 01 December 2014
                Categories
                Research Papers

                patterson superposition,direct methods,dual-space recycling,space-group determination,element assignment

                Comments

                Comment on this article