Multi-target Parallel Processing Approach for Gene-to-structure Determination of the Influenza Polymerase PB2 Subunit

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Pandemic outbreaks of highly virulent influenza strains can cause widespread morbidity and mortality in human populations worldwide. In the United States alone, an average of 41,400 deaths and 1.86 million hospitalizations are caused by influenza virus infection each year ¹. Point mutations in the polymerase basic protein 2 subunit (PB2) have been linked to the adaptation of the viral infection in humans ². Findings from such studies have revealed the biological significance of PB2 as a virulence factor, thus highlighting its potential as an antiviral drug target.

The structural genomics program put forth by the National Institute of Allergy and Infectious Disease (NIAID) provides funding to Emerald Bio and three other Pacific Northwest institutions that together make up the Seattle Structural Genomics Center for Infectious Disease (SSGCID). The SSGCID is dedicated to providing the scientific community with three-dimensional protein structures of NIAID category A-C pathogens. Making such structural information available to the scientific community serves to accelerate structure-based drug design.

Structure-based drug design plays an important role in drug development. Pursuing multiple targets in parallel greatly increases the chance of success for new lead discovery by targeting a pathway or an entire protein family. Emerald Bio has developed a high-throughput, multi-target parallel processing pipeline (MTPP) for gene-to-structure determination to support the consortium. Here we describe the protocols used to determine the structure of the PB2 subunit from four different influenza A strains.

Related collections

Most cited references 10

Record: found
Abstract: found
Article: not found

Phaser crystallographic software

Airlie J. McCoy, Ralf W. Grosse-Kunstleve, Paul D. Adams … (2007)

1. Introduction Improved crystallographic methods rely on both improved automation and improved algorithms. The software handling one part of structure solution must be automatically linked to software handling parts upstream and downstream of it in the structure solution pathway with (ideally) no user input, and the algorithms implemented in the software must be of high quality, so that the branching or termination of the structure solution pathway is minimized or eliminated. Automation allows all the choices in structure solution to be explored where the patience and job-tracking abilities of users would be exhausted, while good algorithms give solutions for poorer models, poorer data or unfavourable crystal symmetry. Both forms of improvement are essential for the success of high-throughput structural genomics (Burley et al., 1999 ▶). Macromolecular phasing by either of the two main methods, molecular replacement (MR) and experimental phasing, which includes the technique of single-wavelength anomalous dispersion (SAD), are key parts of the structure solution pathway that have potential for improvement in both automation and the underlying algorithms. MR and SAD are good phasing methods for the development of structure solution pipelines because they only involve the collection of a single data set from a single crystal and have the advantage of minimizing the effects of radiation damage. Phaser aims to facilitate automation of these methods through ease of scripting, and to facilitate the development of improved algorithms for these methods through the use of maximum likelihood and multivariate statistics. Other software shares some of these features. For molecular replacement, AMoRe (Navaza, 1994 ▶) and MOLREP (Vagin & Teplyakov, 1997 ▶) both implement automation strategies, though they lack likelihood-based scoring functions. Likelihood-based experimental phasing can be carried out using Sharp (La Fortelle & Bricogne, 1997 ▶). 2. Algorithms The novel algorithms in Phaser are based on maximum likelihood probability theory and multivariate statistics rather than the traditional least-squares and Patterson methods. Phaser has novel maximum likelihood phasing algorithms for the rotation functions and translation functions in MR and the SAD function in experimental phasing, but also implements other non-likelihood algorithms that are critical to success in certain cases. Summaries of the algorithms implemented in Phaser are given below. For completeness and for consistency of notation, some equations given elsewhere are repeated here. 2.1. Maximum likelihood Maximum likelihood is a branch of statistical inference that asserts that the best model on the evidence of the data is the one that explains what has in fact been observed with the highest probability (Fisher, 1922 ▶). The model is a set of parameters, including the variances describing the error estimates for the parameters. The introduction of maximum likelihood estimators into the methods of refinement, experimental phasing and, with Phaser, MR has substantially increased success rates for structure solution over the methods that they replaced. A set of thought experiments with dice (McCoy, 2004 ▶) demonstrates that likelihood agrees with our intuition and illustrates the key concepts required for understanding likelihood as it is applied to crystallography. The likelihood of the model given the data is defined as the probability of the data given the model. Where the data have independent probability distributions, the joint probability of the data given the model is the product of the individual distributions. In crystallography, the data are the individual reflection intensities. These are not strictly independent, and indeed the statistical relationships resulting from positivity and atomicity underlie direct methods for small-molecule structures (reviewed by Giacovazzo, 1998 ▶). For macromolecular structures, these direct-methods relationships are weaker than effects exploited by density modification methods (reviewed by Kleywegt & Read, 1997 ▶); the presence of solvent means that the molecular transform is over-sampled, and if there is noncrystallographic symmetry then other correlations are also present. However, the assumption of independence is necessary to make the problem tractable and works well in practice. To avoid the numerical problems of working with the product of potentially hundreds of thousands of small probabilities (one for each reflection), the log of the likelihood is used. This has a maximum at the same set of parameters as the original function. Maximum likelihood also has the property that if the data are mathematically transformed to another function of the parameters, then the likelihood optimum will occur at the same set of parameters as the untransformed data. Hence, it is possible to work with either the structure-factor intensities or the structure-factor amplitudes. In the maximum likelihood functions in Phaser, the structure-factor amplitudes (Fs), or normalized structure-factor amplitudes (Es, which are Fs normalized so that the mean-square values are 1) are used. The crystallographic phase problem means that the phase of the structure factor is not measured in the experiment. However, it is easiest to derive the probability distributions in terms of the phased structure factors and then to eliminate the unknown phase by integration, a process known as integrating out a nuisance variable (the nuisance variable being the introduced phase of the observed structure factor, or equivalently the phase difference between the observed structure factor and its expected value). The central limit theorem applies to structure factors, which are sums of many small atomic contributions, so the probability distribution for an acentric reflection, F O, given the expected value of F O (〈F O〉) is a two-dimensional Gaussian with variance Σ centred on 〈F O〉. (Note that here and in the following, bold font is used to represent complex or signed structure factors, and italics to represent their amplitudes.) In applications to molecular replacement and structure refinement, 〈F O〉 is the structure factor calculated from the model (F C) multiplied by a fraction D (where 0 R, H = 0. The atoms are taken to be of equal mass. The eigenvalues λ and eigenvectors U of H can then be calculated. The eigenvalues are directly proportional to the squares of the vibrational frequencies of the normal modes, the lowest eigenvalues thus giving the lowest normal modes. Six of the eigenvalues will be zero, corresponding to the six degrees of freedom for a rotation and translation of the entire structure. For all but the smallest proteins, eigenvalue decomposition of the all-atom Hessian is not computationally feasible with current computer technology. Various methods have been developed to reduce the size of the eigenvalue problem. Bahar et al. (1997 ▶) and Hinsen (1998 ▶) have shown that it is possible to find the lowest frequency normal modes of proteins in the elastic network model by considering amino acid Cα atoms only. However, this merely postpones the computational problem until the proteins are an order of magnitude larger. The problem is solved for any size protein with the rotation–translation block (RTB) approach (Durand et al., 1994 ▶; Tama et al., 2000 ▶), where the protein is divided into blocks of atoms and the rotation and translation modes for each block used project the full Hessian into a lower dimension. The projection matrix is a block-diagonal matrix of dimensions 3N × 3N. Each of the NB block matrices P nb has dimensions 3N nb × 6, where N nb is the number of atoms in the block nb, For atom j in block nb displaced from the centre of mass, of the block, the 3 × 6 matrix P nb,j is The first three columns of the matrix contain the infinitesimal translation eigenvectors of the block and last three columns contain the infinitesimal rotation eigenvectors of the block. The orthogonal basis Q of P nb is then found by QR decomposition: where Q nb is a 3N nb × 6 orthogonal matrix and R nb is a 6 × 6 upper triangle matrix. H can then be projected into the subspace spanned by the translation/rotation basis vectors of the blocks: where The eigenvalues λP and eigenvectors U P of the projected Hessian are then found. The RTB method is able to restrict the size of the eigenvalue problem for any size of protein with the inclusion of an appropriately large N nb for each block. In the implementation of the RTB method in Phaser, N nb for each block is set for each protein such that the total size of the eigenvalue problem is restricted to a matrix H P of maximum dimensions 750 × 750. This enables the eigenvalue problem to be solved in a matter of minutes with current computing technology. The eigenvectors of the translation/rotation subspace can then be expanded back to the atomic space (dimensions of U are N × N): As for the decomposition of the full Hessian H, the eigenvalues are directly proportional to the squares of the vibrational frequencies of the normal modes, the lowest eigenvalues thus giving the lowest normal modes. Although the eigenvalues and eigenvectors generated from decomposition of the full Hessian and using the RTB approach will diverge with increasing frequency, the RTB approach is able to model with good accuracy the lowest frequency normal modes, which are the modes of interest for looking at conformational difference in proteins. The all-atom, Cα only and RTB normal-mode analysis methods are implemented in Phaser. After normal-mode analysis, n normal modes can be used to generate 2 n − 1 (nonzero) combinations of normal modes. Phaser allows the user to specify the r.m.s. deviation between model and target desired by the perturbation, and the fraction dq of the displacement vector for each mode combination corresponding to each model combination is then used to generate the models. Large r.m.s. deviations will cause the geometry of the model to become distorted. Phaser reports when the model becomes so distorted that there are Cα clashes in the structure. 2.4. Packing function The packing of potential solutions in the asymmetric unit is not inherently part of the translation function. It is therefore possible that an arrangement of models has a high log-likelihood gain, although the models may overlap and therefore be physically unreasonable. The packing of the solutions is checked using a clash test using a subset of the atoms in the structure: the ‘trace’ atoms. For proteins, the trace atoms are the Cα positions, spaced at 3.8 Å. For nucleic acid, the phosphate and C atoms in the ribose-phosphate backbone and the N atoms of the bases are selected as trace atoms. These atoms are also spaced at about 3.8 Å, so that the density of trace atoms in nucleic acid is similar to that of proteins, which makes the number of protein–protein, protein–nucleic acid and nucleic acid–nucleic acid clashes comparable where there is a mixed protein–nucleic acid structure. For the clash test, the number of trace atoms from another model within a given distance (default 3 Å) is counted. The clash test includes symmetry-related copies of the model under consideration, other components in the asymmetric unit and their symmetry-related copies. If the search model has a low sequence identity with the target, or has large flexible loops that could adopt an alternative conformation, the number of clashes may be expected to be nonzero. By default the best packing solutions are carried forward, although a specific number of allowed clashes may also be given as the cut-off for acceptance. However, it is better to edit models before use so that structurally nonconserved surface loops are excluded, as they will only contribute noise to the rotation and translation functions. Where an ensemble of structures is used as the model, the highest homology model is taken as the template for the packing search. Before this model is used, the trace atom positions are edited to take account of large conformational differences between the models in the ensemble. Equivalent trace atom positions are compared and if the coordinates deviate by more than 3 Å then the template trace atom is deleted. Thus, use of an ensemble not only improves signal to noise in the maximum likelihood search functions, it also improves the discrimination of possible solutions by the packing function. 2.5. Minimizer Minimization is used in Phaser to optimize the parameters against the appropriate log-likelihood function in the anisotropy correction, in MR (refines the position and orientation of a rigid-body model) and in SAD phasing. The same minimizer code is used for all three applications and has been designed to be easily extensible to other applications. The minimizer for the anisotropy correction uses Newton’s method, while MR and SAD use the standard Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm. Both minimization methods in Phaser include a line search. The line search algorithm is a basic iterative method for finding the local minimum of a target function f. Starting at parameters x , the algorithm finds the minimum (within a convergence tolerance) of by varying γ, where γ is the step distance along a descent direction d . Newton’s method and the BFGS algorithm differ in the determination of the descent direction d that is passed to the line search, and thus the speed of convergence. Within one cycle of the line search (where there is no change in d ) the trial step distances γ are chosen using the golden section method. The golden ratio (51/2/2 + 1/2) divides a line so that the ratio of the larger part to the total is the same as the ratio of the smaller to larger. The method makes no assumptions about the function’s behaviour; in particular, it does not assume that the function is quadratic within the bracketed section. If this assumption were made, the line search could proceed via parabolic interpolation. Newton’s method uses the Hessian matrix H of second derivatives and the gradient g at the initial set of parameters x 0 to find the values of the parameters at the minimum x min. If the function is quadratic in x then Newton’s method will find the minimum in one step, but if not, iteration is required. The method requires the inversion of the Hessian matrix, which, for large matrices, consumes a large amount of computational time and memory resources. The eigenvalues of the Hessian need to be positive for the function to be at a minimum, rather than a maximum or saddle point, since the method converges to any point where the gradient vector is zero. When used with the anisotropy correction, the full Hessian matrix is calculated analytically. The BFGS algorithm is one of the most powerful minimization methods when calculation of the full Hessian using analytic or finite difference methods is very computationally intensive. At every step, the gradient search vector is analysed to build up an approximate Hessian matrix H, in order to make the resulting search vector direction d better than the original gradient vector direction. In the ‘pure’ form of the BFGS algorithm, the method is started with matrix H equal to the identity matrix. The off-diagonal elements of the Hessian, the mixed second derivatives (i.e. ∂2LL/∂p i ∂p j ) are thus initially zero. As the BFGS cycle proceeds, the off-diagonal elements become nonzero using information derived from the gradient. However, in Phaser, the matrix H is not the identity but rather is seeded with diagonal elements equal to the second derivatives of the parameters (p i ) with respect to the log-likelihood target function (LL) (i.e. ∂2LL/∂p i 2, or curvatures), the values found in the ‘true’ Hessian. For the SAD refinement the diagonal elements are calculated analytically, but for the MR refinement the diagonal elements are calculated by finite difference methods. Seeding the Hessian with the diagonal elements dramatically accelerates convergence when the parameters are on different scales; when an identity matrix is used, the parameters on a larger scale can fail to shift significantly because their gradients tend to be smaller, even though the necessary shifts tend to be larger. In the inverse Hessian, small curvatures for parameters on a large scale translate into large scale factors applied to the corresponding gradient terms. If any of these curvature terms are negative (as may happen when the parameters are far from their optimal values), the matrix is not positive definite. Such a situation is corrected by using problem-specific information on the expected relative scale of the parameters from the ‘large-shift’ variable, as discussed below in §2.5.1. In addition to the basic minimization algorithms, the minimizer incorporates the ability to bound, constrain, restrain and reparameterize variables, as discussed in detail below. Bounds must be applied to prevent parameters becoming nonphysical, constraints effectively reduce the number of parameters, restraints are applied to include prior probability information, and reparameterization of variables makes the parameter space more quadratic and improves the performance of the minimizer. 2.5.1. Problem-specific parameter scaling information When a function is defined for minimization in Phaser, information must be provided on the relative scales of the parameters of that function, through a ‘large-shifts’ variable. As its name implies, the variable defines the size of a parameter shift that would be considered ‘large’ for each parameter. The ratios of these large-shift values thus specify prior knowledge about the relative scales of the different parameters for each problem. Suitable large-shift values are found by a combination of physical insight (e.g. the size of a coordinate shift considered to be large will be proportional to d min for the data set) and numerical simulations, studying the behaviour of the likelihood function as parameters are varied systematically in a variety of test cases. The large-shifts information is used in two ways. Firstly, it is used to prevent the line search from taking an excessively large step, which can happen if the estimated curvature for a parameter happens to be too small and can lead to the refinement becoming numerically unstable. If the initial step for a line search would change any parameter by more than its large-shift value, the initial step is scaled down. Secondly, it is used to provide relative scale information to correct negative curvature values. Parameters with positive curvatures are used to define the average relationship between the large-shift values and the curvatures, which can then be used to compute appropriate curvature values for the parameters with negative curvatures. This stabilizes the refinement until it is sufficiently close to the minimum that all curvatures become positive. 2.5.2. Reparameterization Second-order minimization algorithms in effect assume that, at least in the region around the minimum, the function can be approximated as a quadratic. Where this assumption holds, the minimizer will converge faster. It is therefore advantageous to use functions of the parameters being minimized so that the target function is more quadratic in the new parameter space than in the original parameter space (Edwards, 1992 ▶). For example, atomic B factors tend to converge slowly to their refined values because the B factor appears in the exponential term in the structure-factor equation. Although any function of the parameters can be used for this purpose, we have found that taking the logarithm of a parameter is often the most effective reparameterization operation (not only for the B factors). The offset x offset is chosen so that the value of x′ does not become undefined for allowed values of x, and to optimize the quadratic nature of the function in x′. For instance, atomic B factors are reparameterized using an offset of 5 Å2, which allows the B factors to approach zero and also has the physical interpretation of accounting roughly for the width of the distribution of electrons for a stationary atom. 2.5.3. Bounds Bounds on the minimization are applied by setting upper and/or lower limits for each variable where required (e.g. occupancy minimum set to zero). If a parameter reaches a limit during a line search, that line search is terminated. In subsequent line searches, the gradient of that parameter is set to zero whenever the search direction would otherwise move the parameter outside of its bounds. Multiplying the gradient by the step size thus does not alter the value of the parameter at its limit. The parameter will remain at its limit unless calculation of the gradient in subsequent cycles of minimization indicates that the parameter should move away from the boundary and into the allowed range of values. 2.5.4. Constraints Space-group-dependent constraints apply to the anisotropic tensor applied to ΣN in the anisotropic diffraction correction. Atoms on special positions also have constraints on the values of their anisotropic tensor. The anisotropic displacement ellipsoid must remain invariant under the application of each symmetry operator of the space group or site-symmetry group, respectively (Giacovazzo, 1992 ▶; Grosse-Kunstleve & Adams, 2002 ▶). These constraints reduce the number of parameters by either fixing some values of the anisotropic B factors to zero or setting some sets of B factors to be equal. The derivatives in the gradient and Hessian must also be constrained to reflect the constraints in the parameters. 2.5.5. Restraints Bayes’ theorem describes how the probability of the model given the data is related to the likelihood and gives a justification for the use of restraints on the parameters of the model. If the probability of the data is taken as a constant, then P(model) is called the prior probability. When the logarithm of the above equation is taken, Prior probability is therefore introduced into the log-likelihood target function by the addition of terms. If parameters of the model are assumed to have independent Gaussian probability distributions, then the Bayesian view of likelihood will lead to the addition of least-squares terms and hence least-squares restraints on those parameters, such as the least-squares restraints applied to bond lengths and bond angles in typical macromolecular structure refinement programs. In Phaser, least-squares terms are added to restrain the B factors of atoms to the Wilson B factor in SAD refinement, and to restrain the anisotropic B factors to being more isotropic (the ‘sphericity’ restraint). A similar sphericity restraint is used in SHELXL (Sheldrick, 1995 ▶) and in REFMAC5 (Murshudov et al., 1999 ▶). 3. Automation Phaser is designed as a large set of library routines grouped together and made available to users as a series of applications, called modes. The routine-groupings in the modes have been selected mainly on historical grounds; they represent traditional steps in the structure solution pipeline. There are 13 such modes in total: ‘anisotropy correction’, ‘cell content analysis’, ‘normal-mode analysis’, ‘ensembling’, ‘fast rotation function’, ‘brute rotation function’, ‘fast translation function’, ‘brute translation function’, ‘log-likelihood gain’, ‘rigid-body refinement’, ‘single-wavelength anomalous dispersion’, ‘automated molecular replacement’ and ‘automated experimental phasing’. The ‘automated molecular replacement’ and ‘automated experimental phasing’ modes are particularly powerful and aim to automate fully structure solution by MR and SAD, respectively. Aspects of the decision making within the modes are under user input control. For example, the ‘fast rotation function’ mode performs the ensembling calculation, then a fast rotation function calculation and then rescores the top solutions from the fast search with a brute rotation function. There are three possible fast rotation function algorithms and two possible brute rotation functions to choose from. There are four possible criteria for selecting the peaks in the fast rotation function for rescoring with the brute rotation function, and for selecting the results from the rescoring for output. Alternatively, the rescoring of the fast rotation function with the brute rotation function can be turned off to produce results from the fast rotation function only. Other modes generally have fewer routines but are designed along the same principles (details are given in the documentation). 3.1. Automated molecular replacement Most structures that can be solved by MR with Phaser can be solved using the ‘automated molecular replacement’ mode. The flow diagram for this mode is shown in Fig. 1 ▶. The search strategy automates four search processes: those for multiple components in the asymmetric unit, for ambiguity in the hand of the space group and/or other space groups in the same point group, for permutations in the search order for components (when there are multiple components), and for finding the best model when there is more than one possible model for a component. 3.1.1. Multiple components of asymmetric unit Where there are many models to be placed in the asymmetric unit, the signal from the placement of the first model may be buried in noise and the correct placement of this first model only found in the context of all models being placed in the asymmetric unit. One way of tackling this problem has been to use stochastic methods to search the multi-dimensional space (Chang & Lewis, 1997 ▶; Kissinger et al., 1999 ▶; Glykos & Kokkinidis, 2000 ▶). However, we have chosen to use a tree-search-with-pruning approach, where a list of possible placements of the first (and subsequent) models is kept until the placement of the final model. This tree-search-with-pruning search strategy can generate very branched searches that would be challenging for users to negotiate by running separate jobs, but becomes trivial with suitable automation. The search strategy exploits the strength of the maximum likelihood target functions in using prior information in the search for subsequent components in the asymmetric unit. The tree-search-with-pruning strategy is heavily dependent on the criteria used for selecting the peaks that survive to the next round. Four selection criteria are available in Phaser: selection by percentage difference between the top and mean log-likelihood of the search, selection by Z score, selection by number of peaks, and selection of all peaks. The default is selection by percentage, with the default percentage set at 75%. This selection method has the advantage that, if there is one clear peak standing well above the noise, it alone will be passed to the next round, while if there is no clear signal, all peaks high in the list will be passed as potential solutions to the next round. If structure solution fails, it may be possible to rescue the solution by reducing the percentage cut-off used for selection from 75% to, for example, 65%, so that if the correct peak was just missing the default cut-off, it is now included in the list passed to the next round. The tree-search-with-pruning search strategy is sub-optimal where there are multiple copies of the same search model in the asymmetric unit. In this case the search generates many branches, each of which has a subset of the complete solution, and so there is a combinatorial explosion in the search. The tree search would only converge onto one branch (solution) with the placement of the last component on each of the branches, but in practice the run time often becomes excessive and the job is terminated before this point can be reached. When searching for multiple copies of the same component in the asymmetric unit, several copies should be added at each search step (rather than branching at each search step), but this search strategy must currently be performed semi-manually as described elsewhere (McCoy, 2007 ▶). 3.1.2. Alternative space groups The space group of a structure can often be ambiguous after data collection. Ambiguities of space group within the one point group may arise from theoretical considerations (if the space group has an enantiomorph) or on experimental grounds (the data along one or more axes were not collected and the systematic absences along these axes cannot be determined). Changing the space group of a structure to another in the same point group can be performed without re-indexing, merging or scaling the data. Determination of the space group within a point group is therefore an integral part of structure solution by MR. The translation function will yield the highest log-likelihood gain for a correctly packed solution in the correct space group. Phaser allows the user to make a selection of space groups within the same point group for the first translation function calculation in a search for multiple components in the asymmetric unit. If the signal from the placement of the first component is not significantly above noise, the correct space group may not be chosen by this protocol, and the search for all components in the asymmetric unit should be completed separately in all alternative space groups. 3.1.3. Alternative models As the database of known structures expands, the number of potential MR models is also rapidly increasing. Each available model can be used as a separate search model, or combined with other aligned structures in an ‘ensemble’ model. There are also various ways of editing structures before use as MR models (Schwarzenbacher et al., 2004 ▶). The number of MR trials that can be performed thus increases combinatorially with the number of potential models, which makes job tracking difficult for the user. In addition, most users stop performing MR trials as soon as any solution is found, rather than continuing the search until the MR solution with the greatest log-likelihood gain is found, and so they fail to optimize the starting point for subsequent steps in the structure solution pipeline. The use of alternative models to represent a structure component is also useful where there are multiple copies of one type of component in the asymmetric unit and the different copies have different conformations due to packing differences. The best solution will then have the different copies modelled by different search models; if the conformation change is severe enough, it may not be possible to solve the structure without modelling the differences. A set of alternative search models may be generated using previously observed conformational differences among similar structures, or, for example, by normal-mode analysis (see §2.3). Phaser automates searches over multiple models for a component, where each potential model is tested in turn before the one with the greatest log-likelihood gain is found. The loop over alternative models for a component is only implemented in the rotation functions, as the solutions passed from the rotation function to the translation function step explicitly specify which model to use as well as the orientation for the translation function in question. 3.1.4. Search order permutation When searching for multiple components in the asymmetric unit, the order of the search can be a factor in success. The models with the biggest component of the total structure factor will be the easiest to find: when weaker scattering components are the subject of the initial search, the solution may be buried in noise and not significant enough to survive the selection criteria in the tree-search-with-pruning search strategy. Once the strongest scattering components are located, then the search for weaker scattering components (in the background of the strong scattering components) is more likely to be a success. Having a high component of the total structure factor correlates with the model representing a high fraction of the total contents of the asymmetric unit, low r.m.s. deviation between model and target atoms, and low B factors for the target to which the model corresponds. Although the first of these (high completeness) can be determined in advance from the fraction of the total molecular weight represented by the model, the second can only be estimated from the Chothia & Lesk (1986 ▶) formula and the third is unknown in advance. If structure solution fails with the search performed in the order of the molecular weights, then other permutations of search order should be tried. In Phaser, this possibility is automated on request: the entire search strategy (except for the initial anisotropic data correction) is performed for all unique permutations of search orders. 3.2. Automated experimental phasing SAD is the simplest type of experimental phasing method to automate, as it involves only one crystal and one data set. SAD is now becoming the experimental phasing method of choice, overtaking multiple-wavelength anomalous dispersion because only a single data set needs to be collected. This can help minimize radiation damage to the crystal, which has a major adverse effect on the success of multi-wavelength experiments. The ‘automated experimental phasing’ mode in Phaser takes an atomic substructure determined by Patterson, direct or dual-space methods (Karle & Hauptman, 1956 ▶; Rossmann, 1961 ▶; Mukherjee et al., 1989 ▶; Miller et al., 1994 ▶; Sheldrick & Gould, 1995 ▶; Sheldrick et al., 2001 ▶; Grosse-Kunstleve & Adams, 2003 ▶) and refines the positions, occupancies, B factors and values of the atoms to optimize the SAD function, then uses log-likelihood gradient maps to complete the atomic substructure. The flow diagram for this mode is shown in Fig. 2 ▶. The search strategy automates two search processes: those for ambiguity in the hand of the space group and for completing atomic substructure from log-likelihood gradient maps. A feature of using the SAD function for phasing is that the substructure need not only consist of anomalous scatterers; indeed it can consist of only real scatterers, since the real scattering of the partial structure is used as part of the phasing function. This allows structures to be completed from initial real scattering models. 3.2.1. Enantiomorphic space groups Since the SAD phasing mode of Phaser takes as input an atomic substructure model, the space group of the solution has already been determined to within the enantiomorph of the correct space group. Changing the enantiomorph of a SAD refinement involves changing the enantiomorph of the heavy atoms, or in some cases the space group (e.g. the enantiomorphic space group of P41 is P43). In some rare cases (Fdd2, I41, I4122, I41 md, I41 cd, I 2d, F4132; Koch & Fischer, 1989 ▶) the origin of the heavy-atom sites is changed [e.g. the enantiomorphic space group of I41 is I41 with the origin shifted to ( , 0, 0)]. If there is only one type of anomalous scatterer, the refinement need not be repeated in both hands: only the phasing needs to be carried out in the second hand to be considered. However, if there is more than one type of anomalous scatterer, then the refinement and substructure completion needs to be repeated, as it will not be enantiomorphically symmetric in the other hand. To facilitate this, Phaser runs the refinement and substructure completion in both hands [as does other experimental phasing software, e.g. Solve (Terwilliger & Berendzen, 1999 ▶) and autosharp (Vonrhein et al., 2006 ▶)]. The correct space group can then be found by inspection of the electron density maps; the density will only be interpretable in the correct space group. In cases with significant contributions from at least two types of anomalous scatterer in the substructure, the correct space group can also be identified by the log-likelihood gain. 3.2.2. Completing the substructure Peaks in log-likelihood gradient maps indicate the coordinates at which new atoms should be added to improve the log-likelihood gain. In the initial maps, the peaks are likely to indicate the positions of the strongest anomalous scatterers that are missing from the model. As the phasing improves, weaker anomalous scatterers, such as intrinsic sulfurs, will appear in the log-likelihood gradient maps, and finally, if the phasing is exceptional and the resolution high, non-anomalous scatterers will appear, since the SAD function includes a contribution from the real scattering. After refinement, atoms are excluded from the substructure if their occupancy drops below a tenth of the highest occupancy amongst those atoms of the same atom type (and therefore ). Excluded sites are flagged rather than permanently deleted, so that if a peak later appears in the log-likelihood gradient map at this position, the atom can be reinstated and prevented from being deleted again, in order to prevent oscillations in the addition of new sites between cycles and therefore lack of convergence of the substructure completion algorithm. New atoms are added automatically after a peak and hole search of the log-likelihood gradient maps. The cut-off for the consideration of a peak as a potential new atom is that its Z score be higher than 6 (by default) and also higher than the depth of the largest hole in the map, i.e. the largest hole is taken as an additional indication of the noise level of the map. The proximity of each potential new site to previous atoms is then calculated. If a peak is more than a cut-off distance (κ Å) of a previous site, the peak is added as a new atom with the average occupancy and B factor from the current set of sites. If the peak is within κ Å of an isotropic atom already present, the old atom is made anisotropic. Holes in the log-likelihood gradient map within κ Å of an isotropic atom also cause the atom’s B factor to be switched to anisotropic. However, if the peak or hole is within κ Å of an anisotropic atom already present, the peak or hole is ignored. If a peak is within κ Å of a previously excluded site, the excluded site is reinstated and flagged as not for deletion in order to prevent oscillations, as described above. At the end of the cycle of atom addition and isotropic to anisotropic atomic B-factor switching, new sites within 2κ Å of an old atom that is now anisotropic are then removed, since the peak may be absorbed by refining the anisotropic B factor; if not, it will be accepted as a new site in the next cycle of log-likelihood gradient completion. The distance κ may be input directly by the user, but by default it is the ‘optical resolution’ of the structure (κ = 0.715d min), but not less than 1 Å and no more than 10 Å. If the structure contains more than one significant anomalous scatterer, then log-likelihood gradient maps are calculated from each atom type, the maps compared and the atom type associated with each significant peak assigned from the map with the most significant peak at that location. 3.2.3. Initial real scattering model One of the reasons for including MR and SAD phasing within one software package is the ability to use MR solutions with the SAD phasing target to improve the phases. Since the SAD phasing target contains a contribution from the real scatterers, it is possible to use a partial MR model with no anomalous scattering as the initial atomic substructure used for SAD phasing. This approach is useful where there is a poor MR solution combined with a poor anomalous signal in the data. If the poor MR solution means that the structure cannot be phased from this model alone, and the poor anomalous signal means that the anomalous scatterers cannot be located in the data alone, then using the MR solution as the starting model for SAD phasing may provide enough phase information to locate the anomalous scatterers. The combined phase information will be stronger than from either source alone. To facilitate this method of structure solution, Phaser allows the user to input a partial structure model that will be interpreted in terms of its real scattering only and, following phasing with this substructure, to complete the anomalous scattering model from log-likelihood gradient maps as described above. 3.3. Input and output The fastest and most efficient way, in terms of development time, to link software together is using a scripting language, while using a compiled language is most efficient for intensive computation. Following the lead of the PHENIX project (Adams et al., 2002 ▶, 2004 ▶), Phaser uses Python (http://python.org) as the scripting language, C++ as the compiled language, and the Boost.Python library (http://boost.org/libs/python/) for linking C++ and Python. Other packages, notably X-PLOR (Brünger, 1993 ▶) and CNS (Brünger et al., 1998 ▶), have defined their own scripting languages, but the choice of Python ensures that the scripting language is maintained by an active community. Phaser functionality has mostly been made available to Python at the ‘mode’ level. However, some low-level SAD refinement routines in Phaser have been made available to Python directly, so that they can be easily incorporated into phenix.refine. A long tradition of CCP4 keyword-style input in established macromolecular crystallography software (almost exclusively written in Fortran) means that, for many users, this has been the familiar method of calling crystallographic software and is preferred to a Python interface. The challenge for the development of Phaser was to find a way of satisfying both keyword-style input and Python scripting with minimal increase in development time. Taking advantage of the C++ class structure allowed both to be implemented with very little additional code. Each keyword is managed by its own class. The input to each mode of Phaser is controlled by Input objects, which are derived from the set of keyword classes appropriate to the mode. The keyword classes are in turn derived from a CCP4base class containing the functionality for the keyword-style input. Each keyword class has a parse routine that calls the CCP4base class functions to parse the keyword input, stores the input parameters as local variables and then passes these parameters to a keyword class set function. The keyword class set functions check the validity and consistency of the input, throw errors where appropriate and finally set the keyword class’s member parameters. Alternatively, the keyword class set functions can be called directly from Python. These keyword classes are a standalone part of the Phaser code and have already been used in other software developments (Pointless; Evans, 2006 ▶). An Output object controls all text output from Phaser sent to standard output and to text files. Switches on the Output object give different output styles: CCP4-style for compatibility with CCP4 distribution, PHENIX-style for compatibility with the PHENIX interface, CIMR-style for development, XML-style output for developers of automation scripts and a ‘silent running’ option to be used when running Phaser from Python. In addition to the text output, where possible Phaser writes results to files in standard format; coordinates to ‘pdb’ files and reflection data (e.g. map coefficients) to ‘mtz’ files. Switches on the Output object control the writing of these files. 3.3.1. CCP4-style output CCP4-style output is a text log file sent to standard output. While this form of output is easily comprehensible to users, it is far from ideal as an output style for automation scripts. However, it is the only output style available from much of the established software that developers wish to use in their automation scripts, and it is common to use Unix tools such as ‘grep’ to extract key information. For this reason, the log files of Phaser have been designed to help developers who prefer to use this style of output. Phaser prints four levels of log file, summary, log, verbose and debug, as specified by user input. The important output information is in all four levels of file, but it is most efficient to work with the summary output. Phaser prints ‘SUCCESS’ and ‘FAILURE’ at the end of the log file to demarcate the exit state of the program, and also prints the names of any of the other output files produced by the program to the summary output, amongst other features. 3.3.2. XML output XML is becoming commonly used as a way of communicating between steps in an automation pipeline, because XML output can be added very simply by the program author and relatively simply by others with access to the source code. For this reason, Phaser also outputs an XML file when requested. The XML file encapsulates the mark-up within 〈phaser〉 tags. As there is no standard set of XML tags for crystallographic results, Phaser’s XML tags are mostly specific to Phaser but were arrived at after consultation with other developers of XML output for crystallographic software. 3.3.3. Python interface The most elegant and efficient way to run Phaser as part of an automation script is to call the functionality directly from Python. Using Phaser through the Python interface is similar to using Phaser through the keyword interface. Each mode of operation of Phaser described above is controlled by an Input object and its parameter set functions, which have been made available to Python with the Boost.Python library. Phaser is then run with a call to the ‘run-job’ function, which takes the Input object as a parameter. The ‘run-job’ function returns a Result object on completion, which can then be queried using its get functions. The Python Result object can be stored as a ‘pickled’ class structure directly to disk. Text is not sent to standard out in the CCP4 logfile way but may be redirected to another output stream. All Input and Result objects are fully documented. 4. Future developments Phaser will continue to be developed as a platform for implementing novel phasing algorithms and bringing the most effective approaches to the crystallographic community. Much work remains to be done formulating maximum likelihood functions with respect to noncrystallographic symmetry, to account for correlations in the data and to consider non-isomorphism, all with the aim of achieving the best possible initial electron density map. After a generation in which Fortran dominated crystallographic software code, C++ and Python have become the new standard. Several developments, including Phaser, PHENIX (Adams et al., 2002 ▶, 2004 ▶), Clipper (Cowtan, 2002 ▶) and mmdb (Krissinel et al., 2004 ▶), simultaneously chose C++ as the compiled language at their inception at the turn of the millennium. At about the same time, Python was chosen as a scripting language by PHENIX, ccp4mg (Potterton et al., 2002 ▶, 2004 ▶) and PyMol (DeLano, 2002 ▶), amongst others. Since then, other major software developments have also started or converted to C++ and Python, for example PyWarp (Cohen et al., 2004 ▶), MrBump (Keegan & Winn, 2007 ▶) and Pointless (Evans, 2006 ▶). The choice of C++ for software development was driven by the availability of free compilers, an ISO standard (International Standardization Organization et al., 1998 ▶), sophisticated dynamic memory management and the inherent strengths of using an object-oriented language. Python was equally attractive because of the strong community support, its object-oriented design, and the ability to link C++ and Python through the Boost.Python library or the SWIG library (http://www.swig.org/). Now that a ‘critical mass’ of developers has taken to using the new languages, C++ and Python are likely to remain the standard for crystallographic software for the current generation of crystallographic software developers. Phaser source code has been distributed directly by the authors (see http://www-structmed.cimr.cam.ac.uk/phaser for details) and through the PHENIX and CCP4 (Collaborative Computing Project, Number 4, 1994 ▶) software suites. The source code is released for several reasons, including that we believe source code is the most complete form of publication for the algorithms in Phaser. It is hoped that generous licensing conditions and source distribution will encourage the use of Phaser by other developers of crystallographic software and those writing crystallographic automation scripts. There are no licensing restrictions on the use of Phaser in macromolecular crystallography pipelines by other developers, and the license conditions even allow developers to alter the source code (although not to redistribute it). We welcome suggestions for improvements to be incorporated into new versions. Compilation of Phaser requires the computational crystallography toolbox (cctbx; Grosse-Kunstleve & Adams, 2003 ▶), which includes a distribution of the cmtz library (Winn et al., 2002 ▶). The Boost libraries (http://boost.org/) are required for access to the functionality from Python. Phaser runs under a wide range of operating systems including Linux, Irix, OSF1/Tru64, MacOS-X and Windows, and precompiled executables are available for these platforms when only keyword-style access (and not Python access) is required. Graphical user interfaces to Phaser are available for both the PHENIX and the CCP4 suites. User support is available through PHENIX, CCP4 and from the authors (email cimr-phaser@lists.cam.ac.uk).

0 comments Cited 2836 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Influenza Virus Transmission Is Dependent on Relative Humidity and Temperature

Anice C. Lowen, Samira Mubareka, John Steel … (2007)

Introduction Influenza A virus, of the family Orthomyxoviridae, carries an RNA genome consisting of eight segments of negative-stranded RNA. This genome encodes one or two non-structural proteins and nine structural proteins, which, together with a host cell–derived lipid envelope, comprise the influenza virus particle. Influenza virus causes widespread morbidity and mortality among human populations worldwide: in the United States alone, an average of 41,400 deaths and 1.68 million hospitalizations [1] are attributed to influenza each year. In temperate regions like the United States, this impact is felt predominantly during the winter months; that is, epidemics recur with a highly predictable seasonal pattern. In northern latitudes, influenza viruses circulate from November to March, while in the southern hemisphere influenza occurs primarily from May to September [2]. Tropical regions, by contrast, experience influenza throughout the year, although increased incidence has been correlated with rainy seasons [2,3]. Despite extensive documentation of the seasonal cycles of influenza and curiosity as to their causes, little concrete data is available to indicate why influenza virus infections peak in the wintertime. Theories to explain the seasonal variation of influenza have therefore proliferated over the years (reviewed in [4]). Current hypotheses include fluctuations in host immune competence mediated by seasonal factors such as melatonin [5] and vitamin D [6] levels; seasonal changes in host behavior, such as school attendance, air travel [7], and indoor crowding during cold or rainy weather; and environmental factors, including temperature [8], relative humidity (RH), and the direction of air movement in the upper atmosphere [9]. In early studies using mouse-adapted strains of influenza virus, experiments performed in the winter months yielded a transmission rate of 58.2%; in contrast, a rate of only 34.1% was observed in the summer months [10]. While these data suggested that the seasonal influences acting on humans also affect laboratory mice, no mechanism to explain the observations was identified. Herein, we directly tested the hypotheses that ambient air temperature and RH impact the efficiency with which influenza virus is spread. As a mammalian animal model we used Hartley strain guinea pigs, which we have recently shown to be highly susceptible to infection with human influenza viruses [11]. Importantly, we also found that naïve guinea pigs readily become infected when exposed to inoculated guinea pigs, unlike mice, which do not efficiently transmit influenza virus [11]. Thus, by housing infected and naïve guinea pigs together in an environmental chamber, we were able to assess the efficiency of transmission under conditions of controlled RH and temperature. Our data show that both RH and temperature do indeed affect the frequency of influenza virus transmission among guinea pigs, although via apparently differing mechanisms. Results Twenty replicate experiments were performed in which all factors remained constant except for the RH and/or temperature inside the environmental chamber. Each experiment involved eight guinea pigs, and transmission under each set of conditions was assessed in duplicate. The arrangement of animals in the environmental chamber is illustrated in Figure 1. Virus contained in nasal wash samples collected on alternating days post-inoculation (p.i.) was titrated by plaque assay to determine the infection status of each animal. Serum samples were collected from each animal prior to infection and on day 17 p.i., and seroconversion was assessed by hemagglutination inhibition assay (results in Table S1). Figure 1 Arrangement of Infected and Exposed Guinea Pigs in Environmental Chamber In each experiment, eight animals were housed in a Caron 6030 environmental chamber. Each guinea pig was placed in its own cage, and two cages were positioned on each shelf. Naïve animals were placed behind infected animals, such that the direction of airflow was toward the naïve animals. The cages used were open to airflow through the top and one side, both of which were covered by wire mesh. Although infected and exposed guinea pigs were placed in pairs, air flowed freely between shelves, allowing transmission to occur from any infected to any naïve animal. In general, the behavior (level of activity, food and water consumption, symptoms of infection) of guinea pigs was not observed to change with the ambient relative humidity. Likewise, animals housed at 5 °C behaved in a similar manner to those housed at 20 °C. Guinea pigs kept at 30 °C consumed more water than those housed under cooler conditions, and appeared lethargic. Consistent with our previous observations [11], influenza virus–infected guinea pigs did not display detectable symptoms of disease (e.g., weight loss, fever, sneezing, coughing) during the experiments described. Transmission Efficiency Is Dependent on Relative Humidity The results of transmission experiments performed at 20 °C and five different RHs (20%, 35%, 50%, 65%, and 80%) indicated that the efficiency of aerosol spread of influenza virus varied with RH. Transmission was highly efficient (occurred to three or four of four exposed guinea pigs) at low RH values of 20% or 35%. At an intermediate RH of 50%, however, only one of four naïve animals contracted infection. Three of four exposed guinea pigs were infected at 65% RH, while no transmission was observed at a high RH of 80% (Figure 2). Where transmission was observed, the kinetics with which infection was detected in each exposed animal varied between and within experiments. To an extent, we believe this variation is due to the stochastic nature of infection. However, while most infection events were the product of primary transmission from an inoculated animal, others could be the result of secondary transmission from a previously infected, exposed guinea pig. With the exception of the lack of transmission at 80% RH, the observed relationship between transmission and RH is similar to that between influenza virus stability in an aerosol and RH [12], suggesting that at 20 °C the sensitivity of transmission to humidity is due largely to virus stability. Figure 2 Transmission of Influenza Virus from Guinea Pig to Guinea Pig Is Dependent on Relative Humidity Titers of influenza virus in nasal wash samples are plotted as a function of day p.i. Overall transmission rate and the RH and temperature conditions of each experiment are stated underneath the graph. Titers from intranasally inoculated guinea pigs are represented as dashed lines; titers from exposed guinea pigs are shown with solid lines. Virus titrations were performed by plaque assay on Madin Darby canine kidney cells. Transmission Efficiency Is Inversely Correlated with Temperature To test whether cold temperatures would increase transmission, the ambient temperature in the chamber was lowered to 5 °C and experiments were performed at 35%–80% RH. Overall, transmission was more efficient at 5 °C: 75%–100% transmission occurred at 35% and 50% RH, and 50% transmission was observed at 65% and 80% RH (Figure 3A–3H). The statistical significance of differences in transmission rates at 5 °C compared to 20 °C was assessed using the Fisher's exact test. While at 35% and 65% RH the difference was not found to be significant, at both 50% and 80% RH, transmissibility at 5 °C was found to be greater than that at 20 °C (p 20 °C) and either intermediate (50%) or high (80%) RHs. Materials and Methods Virus. Influenza A/Panama/2007/99 virus (Pan/99; H3N2) was kindly supplied by Adolfo García-Sastre and was propagated in Madin Darby canine kidney cells. Animals. Female Hartley strain guinea pigs weighing 300–350 g were obtained from Charles River Laboratories. Animals were allowed free access to food and water and kept on a 12-h light/dark cycle. Guinea pigs were anesthetized for the collection of blood and of nasal wash samples, using a mixture of ketamine (30 mg/kg) and xylazine (2 mg/kg), administered intramuscularly. All procedures were performed in accordance with the Institutional Animal Care and Used Committee guidelines. During guinea pig transmission experiments, strict measures were followed to prevent aberrant cross-contamination between cages: sentinel animals were handled before inoculated animals, gloves were changed between cages, and work surfaces were sanitized between guinea pigs. Transmission experiments. The term “aerosol” is used herein to describe respiratory droplets of all sizes. The term “droplet nuclei” is used to refer to droplets that remain airborne (typically less than 5 μm in diameter). Each transmission experiment involved eight guinea pigs. On day 0, four of the eight guinea pigs were inoculated intranasally with 103 PFU of influenza A/Panama/2007/99 virus (150 μl per nostril in phosphate buffered saline [PBS] supplemented with 0.3% bovine serum albumin [BSA]) and housed in a separate room from the remaining animals. At 24 h p.i., each of the eight guinea pigs was placed in a “transmission cage”, a standard rat cage (Ancare R20 series) with an open wire top, which has been modified by replacing one side panel with a wire grid. The transmission cages were then placed into the environmental chamber (Caron model 6030) with two cages per shelf, such that the wire grids opposed each other (Figure 1). In this arrangement, the guinea pigs cannot come into physical contact with each other. Each infected animal was paired on a shelf with a naïve animal. The guinea pigs were housed in this way for 7 d, after which they were removed from the chamber and separated. On day 2 p.i. (day 1 post-exposure) and every second day thereafter up to day 12 p.i., nasal wash samples were collected from anesthetized guinea pigs by instilling 1 ml of PBS-BSA into the nostrils and collecting the wash in a Petri dish. Titers in nasal wash samples were determined by plaque assay of 10-fold serial dilutions on Madin Darby canine kidney cells. Serum samples were collected from each animal prior to infection and on day 17 post-infection, and seroconversion was assessed by hemagglutination inhibition assay. All transmission experiments reported herein were performed between September 2006 and April 2007. Analysis of expression levels of mediators of innate immunity. Guinea pigs were inoculated with 103 PFU of Pan/99 virus intranasally and immediately housed under the appropriate conditions (5 °C or 20 °C and 35% RH). At days 1, 2, 3, 5, and 7 post-infection, three guinea pigs were killed and their nasal turbinates removed. Tissues were placed immediately in RNAlater reagent (Qiagen), and stored at 4 °C for 1 to 5 d. RNA was extracted from equivalent masses of tissue using the RNAeasy Protect Mini kit (Qiagen) and subjected to DNAse treatment (Qiagen). One microgram of RNA was subjected to reverse transcription using MMLV reverse transcriptase (Roche). One microlitre of the resultant product was used as the template in a SYBR green (Invitrogen) real-time PCR assay (Roche Light Cycler 480) with Ampli-taq Gold polymerase (Perkin-Elmer). Primers used were as follows: β-actin f AAACTGGAACGGTGAAGGTG; β-actin r CTTCCTCTGTGGAGGAGTGG; Mx1 f CATCCCYTTGrTCATCCAGT; Mx1 r CATCCCyTTGRTCATCCAGT; MDA-5 f GAGCCAGAGCTGATGARAGC; MDA-5 r TCTTATGWGCATACTCCTCTGG; IL-1β f GAAGAAGAGCCCATCGTCTG; IL-1β r CATGGGTCAGACAACACCAG; RANTES f GCAATGCTAGCAGCTTCTCC; RANTES r TTGCCTTGAAAGATGTGCTG; TLR3 f TAACCACGCACTCTGTTTGC; TLR3 r ACAGTATTGCGGGATCCAAG; TNFα f TTCCGGGCAGATCTACTTTG; TNFα r TGAACCAGGAGAAGGTGAGG; MCP-1 f ATTGCCAAACTGGACCAGAG; MCP-1 r CTACGGTTCTTGGGGTCTTG; MCP-3 f TCATTGCAGTCCTTCTGTGC; MCP-3 r TAGTCTCTGCACCCGAATCC; IFNγ f GACCTGAGCAAGACCCTGAG; IFNγ r TGGCTCAGAATGCAGAGATG; STAT1 f AAGGGGCCATCACATTCAC; STAT1 r GCTTCCTTTGGCCTGGAG; TBK1 f CAAGAAACTyTGCCwCAGAAA; TBK1 r AGGCCACCATCCAykGTTA; IRF5 f CAAACCCCGaGAGAAGAAG; IRF5 r CTGCTGGGACtGCCAGA; IRF7 f TGCAAGGTGTACTGGGAGGT; IRF7 r TCACCAGGATCAGGGTCTTC (where R = A or G, Y = C or T, W = A or T, K = T or G). Primer sequences were based either on guinea pig mRNA sequences available in GenBank (MCP1, MCP3, IL-1b, IFNγ, RANTES, TLR3, TNFα, and β-actin), or on the consensus sequence of all species available in GenBank (Mx1, MDA-5, IRF5, IRF7, STAT1, and TBK1). Sequencing of each PCR product indicated that all primer pairs were specific for the expected transcript. Reactions were performed in duplicate and normalized by dividing the mean value of the cycle threshold (Ct) of β-actin expressed as an exponent of 2 (2Ct) by the mean value of 2Ct for the target gene. The fold-induction over the mock-infected was then calculated by dividing the normalized value by the normalized mock value. Data is represented in Figure 5 as the mean of three like samples (nasal turbinates harvested on the same day p.i. from three guinea pigs) ± standard deviation. Statistical analyses. Statistical analyses were performed using GraphPad Prism 5 software. Supporting Information Table S1 Seroconversion of Inoculated and Exposed Guinea Pigs Results of hemagglutination inhibition tests for each transmission experiment are shown. (58 KB DOC) Click here for additional data file. Accession Numbers The GenBank (http://www.ncbi.nlm.nih.gov/Genbank/index.html) accession numbers of guinea pig genes used for primer design are as follows: β-actin (AF508792.1); IFNγ (AY151287.1); IL-1β (AF119622); MCP-1 (L04985); MCP-3 (AB014340); RANTES (CPU77037); TLR3 (DQ415679.1); and TNFα (CPU77036).

0 comments Cited 630 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Recent developments in classical density modification

Kevin Cowtan (2010)

1. Background Phase improvement by density modification has become a routine part of the process of structure solution using experimental phases and is often also used after molecular replacement. There are two families of approaches to density modification: ‘classical’ methods, which iterate modifications to the electron-density map in real space with the reintroduction of the experimental observations in reciprocal space, and ‘statistical’ methods, which construct a probability distribution for the electron-density values as a function of position in real space and transform this distribution to obtain a probability distribution for the phases in reciprocal space. 1.1. Classical density modification Classical density-modification methods have provided a convenient tool for the rapid calculation of ‘improved’ electron-density maps for more than 15 years and have been employed in a number of forms, with the common feature of alternating steps being performed in real and reciprocal space. The calculation commonly follows the following pattern. Starting with a set of experimentally observed structure-factor magnitudes and estimated phase probability distributions, a ‘best’ electron-density map is calculated using the centroid of the phase probability distribution to provide a phase and weight for the structure-factor magnitude. This initial electron-density map is then modified to make it conform more closely to the features expected of a well phased electron-density map. The most common modifications are as follows. (i) Solvent flattening (Wang, 1985 ▶). Features in the solvent region are flattened under the assumption that noise arising from errors in the phases provides a significant contribution to such features. (ii) Histogram matching (Zhang et al., 1997 ▶). The histogram of electron-density values for a well phased map differs from the histogram for a randomly phased map. The application of a nonlinear rescaling to the electron density allows the electron-density map to be modified so that its histogram looks more like that of a well phased map. This process tends to sharpen electron-density peaks and suppress negative density. (iii) Noncrystallographic symmetry (NCS) averaging. In cases where there are several copies of a molecule in the asymmetric unit, the related electron-density values between the molecules may be averaged to improve the signal-to-noise ratio and impose restraints on the phase values. The modified map is then back-transformed, leading to a new set of Fourier coefficients which differ in both magnitude and phase from those used to calculate the initial map. An error estimate is calculated for each phase, usually on the basis of how well the modified magnitudes match the observed values in a particular resolution shell. This error estimate is used to construct a phase probability distribution centred about the modified phase. This phase probability is multiplied by the phase probability distribution from the experimental phasing to provide an updated distribution. The new distribution can be used to calculate an electron-density map for model building or can be used to start a new cycle of density modification. This basic scheme has been implemented with some refinements in the DM (Cowtan et al., 2001 ▶) and SOLOMON (Abrahams & Leslie, 1996 ▶) software, with some variations, as well as in many other packages. The DM software initially implemented solvent flattening, histogram matching and NCS averaging, along with likelihood error estimation using the σA method (Read, 1986 ▶). The SOLOMON software pioneered the use of weighted NCS averaging and also the use of solvent flipping to reduce bias, which was later implemented in DM in the form of the ‘perturbation’ gamma correction (Abrahams, 1997 ▶; Cowtan, 1999 ▶). One distinct technique which is not described here is the use of density modification for resolution extrapolation beyond the limit of the observed data. Pioneered by Caliandro et al. (2005 ▶) and more widely used in the software of Sheldrick (Usón et al., 2007 ▶), this approach can provide significant additional phase improvement, especially when the data already extend to better than 2 Å resolution. 1.2. Statistical density modification Statistical density-modification methods provide a more sound theoretical basis to the problem of phase improvement and as a result reduce the problems of bias associated with classical density-modification methods. This improvement is achieved in two ways. (i) By the expression of the additional information to be introduced to the electron-density map in terms of probability distributions and then carrying those distributions into reciprocal space, rather than working with a single map representing a single sample from the phase probability distributions. (ii) By weakening the link between the additional information to be introduced and the initial phases, thus reducing the bias introduced in a single cycle of phase improvement. Since the current centroid map is not used as the basis for phase improvement, the phase probability distributions from which the centroid map is derived are not directly included in the new phase information incorporated during a single density-modification cycle. The only way in which the current phases are used is in the classification of the asymmetric unit into regions of different density types, e.g. solvent and protein. The result of these two changes is that statistical density-modification techniques lead to reduced phase bias and more realistic estimates of the figures of merit. The resulting method has been implemented in the RESOLVE software (Terwilliger, 1999 ▶). In addition to its application to conventional density-modification problems, it has been particularly effective in removing bias from maps phased from an atomic model through the ‘prime-and-switch’ approach (Terwilliger, 2004 ▶). An alternative implementation in a program called Pirate (Cowtan, 2000 ▶) has been employed successfully in a number of cases, but delivers poor results in other cases for reasons which have yet to be determined. 1.3. Limitations of current methods Statistical phase-improvement methods, and in particular the RESOLVE software, have made a substantial contribution to the field of phase improvement, significantly reducing the problem of bias and additionally providing tools for removing bias from existing phasing. Current implementations are also highly automated, making them particularly suitable for use in structure-solution pipelines. The only significant limitation of these approaches is the computational overhead, with calculations taking minutes rather than seconds. During the rise of statistical methods, classical density-modification techniques have been neglected to some extent, most notably in the implementation of automation features. However, another effect of this neglect has been a failure to implement a number of algorithms which are now routine in other steps of the structure-solution pipeline. The aim of this work is to produce an up-to-date classical density-modification method that is updated to incorporate both automation features and the latest applicable algorithms. Where it has been convenient to do so, direct comparisons have been made to demonstrate the effect of updating each step of the process. The resulting algorithm retains the speed benefits of classical density-modification techniques; it is hoped that this will render it suitable for interactive use from within graphical model-building programs, for example in Coot (Emsley & Cowtan, 2004 ▶). 2. Methods The density-modification algorithm described here follows closely the outline of classical methods and in particularly the approach implemented in the DM software; however, the detailed implementation of some of the steps has been altered. Specifically, the calculation consists of some data-preparation steps followed by a loop in which the data manipulations occur successively in real and reciprocal space. The calculation involves the following steps. (i) Perform an anisotropy correction on the input structure factors. (ii) (Optional) Estimate the solvent content from the sequence. (iii) (Optional) Calculate NCS operators from heavy-atom coordinates or from an atomic model. (iv) Cycle over the following steps a specified number of times. (1) Simulate electron-density histograms for the ordered region of the asymmetric unit using a known structure. (2) Calculate an electron-density map using centroid phases and weights based on the current phase probability distributions. (3) Calculate a solvent mask covering the required volume of the unit cell. (4) (Optional) Prepare an NCS map consisting of the contributions from other NCS copies to each position in the asymmetric unit. (5) Prepare a perturbed map from the initial map by adding a small random signal. (6) Density-modify the initial map by applying the NCS contributions, solvent flattening and histogram matching. (7) Density-modify the perturbed map by applying the NCS contributions, solvent flattening and histogram matching. (8) Compare the two modified maps to estimate the gamma correction required. (9) Apply the gamma correction to the modified unperturbed map. (10) Back-transform to obtain a set of modified magnitudes and phases. (11) Calculate an error model by optimizing the likelihood of the observed data given the calculated data and error model parameters (i.e. a σA-type calculation). (12) Use the error model to calculate updated Hendrickson–Lattman coefficients and 2mF o − DF c-type map coefficients. The general steps of the calculation are very similar to those employed in the DM software. In particular, the gamma-correction calculation is the perturbation gamma method from Cowtan (1999 ▶), with the exception that the perturbation calculation is performed in real rather than reciprocal space. The solvent-flattening and histogram-matching calculations are identical to those described by Zhang et al. (1997 ▶). The solvent mask-determination algorithm is identical to that employed by Abrahams & Leslie (1996 ▶) in the SOLOMON software. The principal differences to the methods mentioned above are as follows. (i) Problem-specific histogram simulation using a known structure. (ii) Use of prior phase information in the calculation of figures of merit and map coefficients. (iii) Application of anisotropy correction to the data. (iv) Pairwise weighted noncrystallographic symmetry averaging. These will be discussed in turn in the following sections. 2.1. Problem-specific histogram simulation from a known structure The implementation of histogram matching in the DM software depended on the use of a standard library of protein histograms calculated from known structures. However, the electron-density histogram is strongly dependent on both the resolution and the Wilson B factor of the data. As a result, in order for this procedure to work it was necessary to rescale the data to match the B factor of the histogram data set before calculating the electron-density map. For simplicity, the overall Wilson B factor was removed from the source data before calculating the reference histogram libraries (i.e. using maps for a pseudo-stationary atom structure) and the working data were also sharpened using a method documented by Cowtan & Main (1998 ▶). The use of a sharpened map potentially introduces additional noise arising from the lower signal-to-noise ratio and poorer phasing of the high-resolution reflections. A better approach is to calculate histograms appropriate to the current problem by matching the resolution and temperature factor of the source data sets from which the histogram is obtained to those of the data from the structure to be solved. The modified source data will then yield histograms that are appropriate to the current problem. (If desired, the data for the unknown structure can also be sharpened or smoothed beforehand.) This approach has been implemented by providing a solved reference structure with observed structure factors and calculated phases from which the software can generate an appropriate histogram library on the fly. The choice of the reference structure does not appear to be critical for normal problems; however, the user can optionally provide their own reference structure if there is a good reason to do so. The figures of merit may also vary systematically as a function of resolution: they will normally be lower at high resolutions. If this contribution is ignored, the electron-density histogram for the reference structure will be systematically sharper than the electron-density histogram for the work structure. Using an over-sharp histogram for histogram matching will tend to up-weight the high-resolution terms, for which the phases are usually worst. The protein density-histogram library is therefore calculated in the following way. The structure factors and phases from the refined model for the reference structure are read into the program, along with the structure factors for the unsolved work structure. The resolution of the reference structure is truncated to match the work structure. The reference-structure data are rescaled with a resolution-dependent scale function (using a smooth-spline scaling following the method of Cowtan, 2002 ▶) to match the scale of the work structure data; this resolution-dependent scaling effectively matches the Wilson B factors. The effect of the resolution-dependence of the figures of merit is also simulated by creating synthetic figures of merit for the rescaled reference structure factors, matching the resolution distribution for the work structure factors. These synthetic figures of merit are used as weights in the calculation of the electron-density map for the reference structure. The known atomic model for the reference structure is then used to calculate a solvent mask and electron-density histograms from the protein region of the simulated map. The resulting histogram may then be used as a target histogram for histogram matching the work map, following the method of Zhang et al. (1997 ▶). 2.2. Use of prior phase information in the calculation of figures of merit and map coefficients After the application of techniques such as solvent flattening and histogram matching to the electron density, an inverse Fourier transform is used to obtain a new set of magnitudes and phases. These are then used to update the phase probability distributions arising from the original experimental phasing calculation. Most previous density-modification algorithms, including DM, SOLOMON and CNS (Cowtan et al., 2001 ▶; Abrahams & Leslie, 1996 ▶; Brünger et al., 1998 ▶), have adopted a two-stage approach to this problem. In the first step, an estimate of the reliability of the modified phases is made on the basis of the agreement between the modified magnitudes and the observed structure factors. The reasoning behind this approach comes from analogy with the problem of calculating map coefficients using a partial structure including both errors and missing atoms and is based on the fact that the size of the discrepancy in the structure-factor magnitudes is a good indicator of the error in the phases. Once an estimate of the error in the phases has been obtained, a phase probability distribution is constructed from the modified phase and estimated error. This phase probability distribution is multiplied by the experimental phase probability distribution to provide an updated distribution. [The distributions are usually represented in terms of Hendrickson–Lattman coefficients (Hendrickson & Lattman, 1970 ▶) and so this multiplication is performed as a simple addition of coefficients.] Map coefficients may also be calculated for ‘best’ and ‘difference’ electron-density maps. To be more specific, the true structure factor is accounted for by two components: a portion of the calculated structure factor (reduced in magnitude because of the errors in the model) and an unknown portion which is represented by a two-dimensional Gaussian in the Argand diagram centred on the reduced calculated structure factor. This approach was developed by Read (1986 ▶) (using the terms D and σA for the scale term and the width of the Gaussian). The error and scale terms are related and are calculated in resolution shells. An alternative implementation using spline coefficients to provide a smooth variation with resolution has been described in Cowtan (2002 ▶) (using the terms s and ω for the scale term and the width of the Gaussian). The approach adopted here is to include the prior experimental phase probability distribution into the calculation of the phase probability distribution for the modified phase and in doing so obtain improved estimates of the scale and error terms. In addition, the updated phase probability distribution and the electron-density map coefficients are obtained directly as part of the same calculation. The method followed is almost identical to that of Cowtan (2005 ▶), with the following difference. The underlying equation for the probability of a phase is given by an equation which includes both the contribution from the calculated structure factor (scaled by a factor s with a Gaussian error term of width ω; see Fig. 1 ▶) and the contribution from the Hendrickson–Lattman coefficients, where d is the difference between the vectors (sF c, ϕc) and (F obs, ϕ), i.e. d 2 = |F o|2 + s 2|F c|2 − 2|F o|s|F c|cos(ϕ − ϕc). This neglects the contribution of the error in the observed F, i.e. σ F . In the previous approach, σ F was used to increment the width of the Gaussian error term ω. This is no longer strictly correct, although when the phase errors in the model dominate (for example in the case of density modification, as contrasted with the very final stages of refinement) it is a good approximation. In order to estimate s and ω, the unknown phase must be integrated out. Integrating the above expression and eliminating constant factors gives rise to The logarithm of this function and its derivatives, summed over all reflections by resolution, are evaluated and used to determine maximum-likelihood estimates for s and ω. As with the likelihood refinement target adopted by Pannu et al. (1998 ▶), the difference map (i.e. mF o − DF c-like) coefficients may be obtained by calculating the gradient of the logarithm of the likelihood function (2) with respect to the calculated structure factor and adjusting the scale to match that of the centroid map. The ‘best’ (i.e. 2mF o − DF c-like) map is obtained by adding the centroid and difference maps. This map is used as a starting point for subsequent cycles of density modification. 2.3. Application of an anisotropy correction to the data Anisotropy in the X-ray diffraction data can lead to similar groups of atoms which look very different in the electron-density map depending on their orientation with respect to the anisotropy of the data. This can affect the density-modification calculation in a number of ways, most notably in estimation of the solvent envelope and in the electron-density histogram of the data. The effects of anisotropy can be reduced by applying an anisotropy correction to the data to enhance the structure factors along directions in which they are weaker (although this does not correct for an anisotropic resolution limit) and this technique has been applied effectively even without an atomic model in programs such as Phaser (McCoy, 2007 ▶; Read, 2008 ▶). An anisotropy correction has been implemented to adjust the input structure factors before the calculation of the first electron-density map. To estimate the anisotropy of the input data, E values are calculated from the observed structure factors. An anisotropic Gaussian is then determined which best fits the E values to the expected value of 1. In order to maintain the speed of the calculation, the scale is estimated by fitting a general quadratic in three dimensions to the logarithm of the E values, which is a linear rather than nonlinear calculation and thus does not require iteration. The anisotropy correction is therefore obtained by minimizing the residual where is the reciprocal orthogonal coordinate corresponding to the reflection index and U is the symmetric matrix of anisotropy coefficients. This approach does not account for the experimental uncertainties and gives different weights to reflections of different magnitudes, but tests using both simulated and real data give similar results to the more thorough approach adopted in REFMAC (Murshudov et al., 1997 ▶). 2.4. Pairwise weighted noncrystallographic symmetry averaging The concept of weighted NCS averaging was introduced by Abrahams & Leslie (1996 ▶) to deal with a case in which different parts of the structure obeyed the NCS relationships to different degrees. This was achieved by use of a ‘weighted averaging mask’; instead of having values of 0 (for unrelated regions of the map) or 1 (for NCS-related regions of the map), Abrahams’ mask could take values in a continuous range between 0 and 1 representing different levels of agreement. In regions where the mask value was less than 1, the weight of the NCS-related density would be less than the weight of the original density at that position in the map. The approach described here extends this work by the introduction of multiple masks, with one mask for each pair of NCS-related density regions. Thus, in the case of threefold symmetry between molecules A, B and C there are six masks: those relating molecules A–B, A–C, B–A, B–C, C–A and C–B. This allows for the case where some pairs of molecules may be more similar than others. For example, if each of the molecules A, B and C have two domains, α and β, both domains may be similar in molecules A and B but domain β may be missing in molecule C. In this case a different mask is required when averaging between molecules A and B as opposed to averaging either of these with molecule C. Previous implementations (e.g. Vellieux et al., 1995 ▶) have calculated a mask covering the NCS-related region at the beginning of the density-modification calculation and then stored this mask for use during the rest of the calculation; however, with so many masks this becomes inconvenient. Instead, the masks are calculated on the fly as they are required, using a highly optimized FFT-based approach. To calculate the mask relating molecules A and B, two maps are calculated covering a spherical region of at least four asymmetric unit volumes about the estimated centre of molecule A. The first map contains the unrotated density for molecule A and the second contains the density from molecule B rotated back into the same orientation as molecule A. Both these maps are subsampled to 1/3 of the sampling (i.e. three times the grid spacing) of the initial electron-density map in order to reduce the computational overhead. The local correlation between the two maps will be used to determine which regions obey the NCS and is calculated by an FFT to further reduce the computational overhead. By default, the local correlation is calculated over a sphere of 6 Å radius about each point in the map. Given the two subsampled maps ρ A and ρ B , the correlation function C local is given by the formula where and N is the number of grid points within a sphere of radius r. Each of the local averages can be calculated by the convolution theorem, requiring two FFTs (plus one additional FFT to calculate the Fourier transform of the spherical mask), giving a total of 11 FFTs. Note that these FFTs are not calculated over the unit cell, as would normally be the case, but rather over a box containing the subsampled grid covering the region of interest. Since these maps are nonrepeating, the map must be padded with smoothed values at the edges to avoid introducing spurious high-resolution terms during the FFTs. The resulting map gives values for the local correlation of the NCS-related regions for every point in the region of interest. The next step is to obtain some estimate of the significance of the correlation values. To do this, a similar local correlation map, calculated between two unrelated regions of density, is used to determine the expected standard deviation σC of the local correlation values from zero (i.e. the mean correlation for unrelated density regions). This standard deviation is then used to convert the local correlation map into a weighted mask function w ncs(x), according to the formula This gives mask values increasing from 0 towards 1 as the local correlation increases above 4σC. This weighted mask is still sampled on the coarse grid. The final step is to interpolate the mask values by trilinear interpolation from the coarse grid back onto the original map grid, giving a mask covering the electron density of molecule A on the same grid as molecule A. 3. Results The approaches described in this paper have been implemented in Parrot, an automated density-modification program. Where it has been simple to do so both the existing and new approaches have been implemented, allowing a direct comparison of the benefits of the new technique that is independent of any other implementation differences. For the remaining cases, some limited inferences may be drawn by comparison of the results from Parrot against the results from the earlier DM software. The new techniques described in the previous section will be considered in turn. The techniques are compared here in terms of the correlation between the density-modified electron-density map and the electron density calculated from the refined structure, with a value of 1 indicating perfect phases and 0 indicating random phases. This approach has an advantage over using a simple or weighted mean phase error in that it is insensitive to changes in the phases of very weak reflections which do not affect the map significantly. (The weighted mean phase error and E-map correlation were also investigated and show similar behaviour to the map correlations presented here in most cases.) 3.1. Problem-specific histogram simulation from a known structure The use of a problem-specific histogram library is the only technique which is implemented in Parrot; thus, to compare the results with the use of a standard library for a stationary-atom structure the results of Parrot (with all the other new features excluded) must be compared against the results of the DM software. The results may therefore be confounded by other differences in the software. The most notable of these is the different solvent mask-determination algorithm. The map correlations for the basic Parrot calculation were compared with the map correlations for the DM calculation using 58 experimentally phased structures from the JCSG data archive (Joint Center for Structural Genomics, 2006 ▶) spanning the resolution range 1.4–3.2 Å. The phasing from the original JCSG structure solution using either MAD or SAD data was used as a starting point for the density-modification tests. In some cases multiple phasing calculations had been run; in this case the phasing run which produced the electron-density map with the greatest contrast (given by the r.m.s.d. of the local r.m.s.d., which is a crude indicator of map quality) was used. A list of the JCSG data sets and the corresponding phasing files used has been deposited as supplementary material to this paper1. For each structure, the Parrot result is plotted against the DM result as a scatter plot; thus, any point falling above the diagonal line y = x represents a case where Parrot gives a better map than DM. The resulting plot is shown in Fig. 2 ▶(a). Note that the new implementation in Parrot, performing a similar calculation to DM with the exception of the mask-calculation algorithm and the problem-specific histogram libraries, gives broadly similar results. Each program performs better on some structures, but the mean map correlation over all the structures is higher for Parrot (0.771 for Parrot versus 0.759 for DM). There is, however, no obvious indication (e.g. dependence on resolution or solvent content) why one program works better than the other in any individual case. 3.2. Use of prior phase information in the calculation of figures of merit and map coefficients In order to test the use of prior phase information, the results of Parrot were compared using both the new likelihood function incorporating the prior phase information and the Rice-function implementation (i.e. the same method used in DM). The latter set of results are the Parrot results from the previous section. The results for the new likelihood function are plotted against the results for the old function and the resulting plot is shown in Fig. 2 ▶(b). Note that the results are improved in the majority of cases and in no case does the prior phasing leads to a significantly worse result. The mean map correlation over all the structures increases from 0.771 to 0.785. One effect of the use of prior phase information in the estimation of errors in the modified structure factors may be the reduction of bias in the density-modification calculation. Without prior phase information, the modified phases may be over-weighted by the modified magnitudes matching the observed values, a state which can be achieved without necessarily fitting the phases correctly. With prior phase information, if the modified phases are wrong and some prior phase information is present in a resolution shell against which to compare them, then those phases will contribute to a higher error estimate. As a result, the problem of bias is reduced. 3.3. Application of an anisotropy correction to the data The effect of the anisotropy correction was tested in the same way, comparing the previous set of results against the results with the same calculation performed using the anisotropy correction. The results for the anisotropy-corrected calculation are compared with the results from the uncorrected case and the resulting plot is shown in Fig. 2 ▶(c). Note that in the majority of cases the correction makes no difference, but in a minority of cases there is a slight improvement in the results and in two cases the improvement is significant. The improvement occurs in cases where the anisotropy is large, although not all anisotropic data sets improve significantly. The results are never worse and the computational overhead is minimal. 3.4. Pairwise weighted noncrystallographic symmetry averaging NCS averaging with a single (binary) averaging mask covering all related NCS copies of a molecule has not been implemented in Parrot and thus a direct comparison is not possible. Comparison to DM is confounded by the differences already noted in §3.1 and by the fact that averaging is not automated in DM and involves manual entry of the averaging operators. As a result, no empirical conclusions can be drawn concerning the benefits of pairwise weighted averaging in comparison to existing methods. However, a comparison between the Parrot results with and without averaging is presented as a demonstration that the method works as an automated tool for improving electron-density maps. The map correlations from the automated NCS-averaging calculation are plotted against the results without averaging (from the previous test) and the resulting plot is shown in Fig. 2 ▶(d). Note that in about half the cases shown the results are significantly improved: these are the cases where the NCS has been correctly determined from the heavy-atom coordinates. For the remaining cases no NCS is present or the NCS could not be identified. In four cases, incorrect NCS operators are determined; however, the weighted averaging mask procedure tends to down-weight the impact of incorrect NCS, so that in only one of these cases is the difference in map correlation significant. 3.5. Other comparisons The amount of computation required for classical and statistical density-modification methods differs substantially. The DM calculation was very fast (a mean of 6 s per structure) and the Parrot calculation only slightly slower (a mean of 10 s per structure), while the statistical method of Pirate was approximately two orders of magnitude slower (a mean of 887 s per structure). An important test of a density-modification technique is whether it allows an atomic model to be built into the resulting electron density. To this end, automated model-building calculations were performed using the Buccaneer model-building software (Cowtan, 2006 ▶) starting from the modified phases from each density-modification program in turn. After averaging over all the test cases to minimize variations arising from instabilities in the model-building calculation, the results were consistent with the mean map correlations reported earlier. 3.6. Future work There is scope for further development of the methods devised here. There are no technical obstacles to implementation of resolution extrapolation beyond the limit of the observed data (Caliandro et al., 2005 ▶; Usón et al., 2007 ▶). The combination of resolution extrapolation with the likelihood-weighting methods described in §2.2 may or may not provide additional benefits. Multi-crystal averaging, as currently implemented in the DMMULTI software, could also be implemented in Parrot. The greatest challenge here is one of automation; in particular the determination of cross-crystal averaging operators. The speed of the program provides scope for various iterative and multi-start approaches, for example optimization of solvent content (as suggested by a referee) or a data-sharpening factor could be achieved with a suitably reliable indicator of the quality of the resulting map. 4. Conclusions Classical density-modification techniques still have significant value. When updated to use the latest methods, in particular the use of prior phase information in the estimation of errors, they can be competitive or nearly competitive with statistical methods while requiring a fraction of the computation time. In addition, the implementation described here in the Parrot software appears to be robust when applied to data from different sources. The speed of the approach described here lends itself to particular problems, including the fast assessment of experimental data at the beamline (in combination with automated phasing and fast model-building algorithms) or use in parallel hierarchical automation models in which many structure-solution pathways are explored in parallel. Supplementary Material Supplementary material file. DOI: 10.1107/S090744490903947X/ba5136sup1.txt

0 comments Cited 166 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): J Vis Exp

Journal ID (iso-abbrev): J Vis Exp

Journal ID (publisher-id): JoVE

Title: Journal of Visualized Experiments : JoVE

Publisher: MyJove Corporation

ISSN (Electronic): 1940-087X

Publication date Collection: 2013

Publication date (Electronic): 28 June 2013

Publication date PMC-release: 28 June 2013

Issue: 76

Electronic Location Identifier: 4225

Affiliations

¹Protein Crystallization Lab, Emerald Bio

²Molecular Biology Lab, Emerald Bio

³Scientific Sales Representative, Emerald Bio

⁴Group Leader II, Emerald Bio

⁵Group Leader I, Emerald Bio

⁶Chair of Advisory Board, Emerald Bio

⁷Director of Multi-Target Services, Emerald Bio

⁸Senior Project Leader, Emerald Bio

⁹Project Leader II & SSGCID Site Manager, Emerald Bio

Author notes

Correspondence to: Bart L. Staker at bstaker@ 123456embios.com

Article

Publisher ID: 4225

DOI: 10.3791/4225

PMC ID: 3747311

PubMed ID: 23851357

SO-VID: 835b2ccf-8ac2-4ddb-a83e-b720828b35c0

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial License, which permits non-commercial use, distribution, and reproduction, provided the original work is properly cited.

Multi-target Parallel Processing Approach for Gene-to-structure Determination of the Influenza Polymerase PB2 Subunit

Read this article at

Abstract

Related collections

Microbiology Independent Research Journal (MIR Journal)

Most cited references 10

Phaser crystallographic software

Influenza Virus Transmission Is Dependent on Relative Humidity and Temperature

Recent developments in classical density modification

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 192

Cited by 1

Most referenced authors 696