20
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Structure of the Alternative Complex III in a supercomplex with cytochrome oxidase

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Alternative Complex III (ACIII) is a key component of the respiratory and/or photosynthetic electron transport chains of many bacteria 1–3 . Like Complex III (bc 1 complex), ACIII catalyzes the oxidation of membrane-bound quinol and reduction of cytochrome c or an equivalent electron carrier. However, the two complexes have no structural similarity 4–7 . Although ACIII has eluded structural characterization, several of its subunits are homologous to members of the CISM (Complex Iron-Sulfur Molybdoenzyme) superfamily 8 , including the proton pump polysulfide reductase 9,10 . We isolated the ACIII from Flavobacterium johnsoniae with native lipids using styrene maleic acid (SMA) copolymer 11–14 , both as an independent enzyme and as a functional 1:1 supercomplex with an aa 3-type cytochrome c oxidase (cyt aa 3). We determined the structure of ACIII to 3.4 Å resolution by cryo-EM and constructed an atomic model for its six subunits. The structure, which contains a [3Fe-4S] cluster, a [4Fe-4S] cluster, and six hemes c, shows that ACIII employs known elements from other electron transport complexes arranged in a previously unknown manner. Modeling of the cyt aa 3 component of the supercomplex revealed that it is structurally modified to facilitate association with ACIII, illustrating the importance of the supercomplex in this electron transport chain. The structure also resolves two of the subunits of ACIII that are anchored to the lipid bilayer with N-terminal triacylated cysteine residues, an important post-translational modification found in numerous prokaryotic membrane proteins that has not previously been observed structurally in a lipid bilayer. The ACIII-cyt aa 3 supercomplex from F. johnsoniae membranes was solubilized, purified, and biochemically characterized using styrene maleic acid (SMA) copolymer nanodiscs without traditional detergents (Supplementary Discussion, Extended Data Fig. 1–3). The supercomplex catalyzes the two-electron oxidiation of menaquinol (or ubiquinol) and the four-electron reduction of O2 to water with a turnover number of ~21 electrons/sec without the addition of exogenous cyt c (Supplementary Information, Extended Data Fig. 3), indicating a functional electron transfer chain within the supercomplex. The addition of exogeneous cyt c did not increase the rate of electron transfer. The structure of the ACIII-cyt aa 3 supercomplex in SMA nanodiscs was determined by cryo-EM (Fig. 1, Extended Data Fig. 4). The supercomplex has a mass of 464 kDa (Supplementary Discussion), a transmembrane cross-section of ~9 nm × 13 nm (Extended Data Fig. 5), and contains 48 transmembrane α-helices. To our knowledge, the ACIII-aa 3 supercomplex is the largest protein complex reported to be contained within an SMA copolymer nanodisc. The SMA copolymer and lipids contribute only a thin layer of density around the supercomplex (Fig. 1a and 1b), which is not circular but follows the contours of the protein. Whether this is a general feature of SMA-solubilized proteins or is due to the large size of the ACIII-aa3 supercomplex is not known, and will be clarified when more structures are determined using this approach. The number of loosely bound, unresolved lipid molecules is not known, nor whether they are in sufficient number to form a true bilayer surrounding the protein. The SMA-supercomplex nanodiscs retain native lipids, are more stable, and have 30% higher specific activity than the supercomplex isolated with detergents (e.g., dodecylmaltoside) (Supplementary Discussion, Extended Data Fig. 3). Since traditional detergents are avoided in generating SMA nanodiscs, the preparative protocol is relatively rapid and simpler than making nanodiscs using the membrane scaffold protein (MSP). While the properties of the SMA nanodiscs are less well characterized than nanodiscs made with membrane scaffolding protein 13,15 , our work demonstrates the utility of SMA nanodiscs for high-resolution structural studies of membrane proteins. The resolution of the cryo-EM density map allowed construction of an atomic model for > 90% of the sequences predicted from the ACIII gene cluster (Supplementary Discussion), including subunits ActA, ActB, ActC, ActD, ActE, and ActF (Fig. 2 and Extended Data Fig. 5, Extended Data Table 1). Sequence analysis shows that ACIII contains a unique combination of known modules from other respiratory complexes 3 (Supplementary Discussion). The ACIII structure confirms this prediction and shows the structure responsible catalyzing the quinol:cytochrome c oxidoreductase activity. The ACIII structure can be divided into three parts: (i) a core assembly of ActC and ActB that oxidizes quinol; (ii) A heme c assembly consisting of ActA and ActE that directs electron from ActB to the terminal electron acceptor; and (iii) Auxiliary transmembrane subunits ActD and ActF with unknown functions. With some key differences (Extended Data Fig. 5), the overall architecture of ActB and ActC resembles the PsrA/PsrB and PsrC subunits of polysulfide reductase from Thermus thermophilus (PsrABC) 16 , a member of the CISM superfamily (Supplementary Discussion). Like PsrC, ActC contains no cofactors but does contain the proposed site for menaquinol oxidation. Residues at the menaquinol binding site identified in PsrC 16 are not conserved in ActC 17 . Although menaquinone is not observed in the cryo-EM map, we propose that His133C and Asp164C form the menaquinol binding site in ActC near the interface with ActB (Extended Data Fig. 6). These two residues are conserved in ActC sequences and there is a crevice between TM3 and TM4 of ActC that would provide access to the substrate in the membrane bilayer. The N-terminal portion of ActB is homologous to the PsrA subunit of polysulfide reductase, which contains the molybdenum cofactor, but the molybdenum cofactor is absent in ActB 4 . The C-terminal domain of ActB is homologous to PsrB and both ActB and PsrB contain Fe-S clusters. Like PsrB, ActB from F. johnsoniae is expected to contain 4 [Fe-S] clusters, but only two are observed in the cryo-EM map (Extended Data Fig. 7). There is one [3Fe-4S] cluster near the interface with ActC, about 10 Å from the proposed site of menaquinol oxidation, and one [4Fe-4S] cluster about 9 Å further away. There are two additional cysteine clusters present in the structure of ActB, but the cryo-EM map does not show [Fe-S] clusters at these locations. Instead, we observe disulfide bonds (Cys965B/Cys938B and Cys971B/Cys769B) within these two cysteine clusters. Substitution of proposed [4Fe-4S] clusters by disulfide bonds may be a genuine aspect of the structure or may be due to oxidation that occurred during sample preparation. However, if these two “missing” [4Fe-4S] clusters were present, they would form a dead-end for electron transfer from the [3Fe-4S] cluster of ActB suggesting that their absence from the structure is not an artifact. The [3Fe-4S] cluster in ActB is the most probable initial oxidant of menaquinol bound to ActC, and is 12.3 Å from the nearest heme c in ActA. The five hemes c in ActA plus the single heme c in ActE form a likely electron transfer wire from the [3Fe-4S] cluster in ActB, with the largest edge-to-edge distance of 9.2 Å between adjacent hemes (Fig. 2b). The [4Fe-4S] cluster in ActB appears to be off-pathway and its function remains to be determined. In all Flavobacteria, including F. johnsoniae, ActA is predicted to have a monoheme domain at the N terminus in addition to the pentaheme domain at the C terminus (Supplementary Discussion). Mass spectrometry analysis shows that the N-terminal monoheme domain is present in the preparation (Extended Data Fig. 1), but no density can be assigned to this entire domain. The inability to resolve the monoheme domain may result from flexibility of the domain. Full-atom molecular dynamics (MD) simulations were performed for the entire structure of ACIII embedded in a phospholipid bilayer to determine the stability and dynamics of the structure (Extended Data Fig. 8). Interestingly, the pentaheme domain of ActA had the largest root-mean-square deviation (RMSD), which is mainly from the transmembrane α-helix connected to the missing monoheme domain, consistent with the monoheme domain being unobservable due to a variable position in the complex. While ActE also had a significant RMSD, that did not appear to correlate with disorder in the cryo-EM map. ActD and ActF are transmembrane subunits without bound cofactors, and both interact with ActC. It has not been established if ACIII generates a proton motive force coupled to electron transport 18 . The absence of redox centers in ActC, ActD and ActF suggests that if ACIII contributes to the transmembrane proton gradient, it does not use the bifurcation-type Q cycle mechanism of canonical Complex III 19 , but instead uses a true proton pump mechanism like Complex I 20 . ActD has two transmembrane α-helices that cross within the membrane and are adjacent to ActC. Both N- and C-termini are within the cytoplasm and combine to form a single globular domain that rests on the cytoplasmic surface of ActC. The ten transmembrane α-helices of ActF form a pseudo two-fold axis of symmetry with the ten transmembrane α-helices of ActC (Extended Data Fig. 5) despite the fact that ActF has less than 20% sequence identity with ActC. If ACIII is a proton pump, it is likely that conserved polar residues within the bilayer will play important roles. The structure of ACIII reveals eleven ordered phospholipid molecules as well as triacylated cysteine residues at the N termini of ActB (Fig. 3a) and ActE (Extended Data Fig. 7). Anchoring of bacterial membrane proteins by an N-terminal triacylated cysteine is a well characterized phenomenon 21 but, to our knowledge, this is the first time the structure of a triacylated cysteine residue has been determined in the context of a membrane protein. Surprisingly, both lipid anchors are tilted with respect to the plane of the lipid bilayer (Fig. 2a), restricting the ability of other lipids to pack around them. This feature could alter the mechanical properties of the adjacent portion of the membrane bilayer and also guide conformational changes in the ACIII protein. Remarkably, the two N-terminal lipid anchors are adjacent to each other in the membrane. These lipid anchors likely help ACIII to assemble and keep the monoheme ActE bound to the complex. The eleven lipids that are resolved adjacent to the transmembrane α-helices accommodate the rugged protein surface of the complex (Fig. 3b and Extended Data Fig. 7). The headgroups of the lipids could not be identified and they were all modeled as phosphatidylethanolamine. There are two “hot spots” for resolved lipids: (i) The cytoplasmic interface between ActC and ActF; and (ii) The vicinity of the triacylated cysteine of ActB, which is near the proposed entry point for menaquinol into the complex. All eleven of the resolved lipids remained bound to the protein throughout 250 ns of MD simulation (Extended Data Fig. 8), supporting the ability of SMA-nanodiscs to preserve some native lipid-protein interactions and suggesting a functional role for the lipids. A large number of annular lipids, including those modelled in the structure, were observed to associate with the protein from the in silico bilayer. Frequently, the subunits encoding ACIII are within an operon that includes subunits of an associated Complex IV 3 (cyt aa 3 or cyt caa 3). We find that the sequences of subunit III from Complex IVs that are associated with ACIII have unusual features that distinguish them from the canonical subunit III (Supplementary Discussion). Whereas subunit III of Complex IV generally contains seven transmembrane α-helices, those that are associated with ACIII lack TM1 and TM2 (Fig. 4a). While only parts of subunit III of cyt aa 3 are resolved at better than 4 Å, the density for cyt aa 3 has sufficient resolution to identify five α-helices from the structure. A homology model of subunit III from F. johnsoniae cyt aa 3 was built based on the structure of TM3 to TM7 of subunit III from Rhodobacter sphaeroides cyt aa 3 and fit into the ACIII-cyt aa 3 supercomplex density map (Extended Data Fig. 9) with high fidelity. The deletion of the first two transmembrane α-helices in subunit III of cyt aa 3 appears to be a necessary adaptation to allow formation of the supercomplex with ACIII. It is interesting that the same two helices in subunit III are also absent in the cyt aa3 obligatory bcc-cyt aa3 supercomplex found in Actinobacteria (e.g., Corynebacterium glutamicam and Mycobacterium tuberculosis) 22 The sequence analysis also reveals that the loop between TM5 and TM6 of subunit III in the cyt aa3 that is part of the supercomplex is much longer in F. johnsoniae (and all Flavobacteria) than in other organisms. Typically, this loop contains eight residues, but in F. johnsoniae, it contains 121 residues (Fig. 4a). Part of this long loop fits in a groove between ActB and ActD of ACIII on the periplasmic side of the membrane (Extended Data Fig. 9). The structural model reveals a π-cation interaction between Trp188 of subunit III and Arg868 of ActB (Fig. 4b), both of which are conserved among organisms containing subunit III with a long loop between TM5 and TM6 (Extended Data Fig. 9). This specific and strong interaction stabilizes the ACIII-cyt aa 3 supercomplex and appears to be a second adaptation that enables supercomplex formation with ACIII. The contact between the periplasmic loop of subunit III of cyt aa 3 and ACIII is the only observed direct contact between the two complexes. The five well-resolved transmembrane α-helices of subunit III of cyt aa 3 are angled away from ACIII with only the tip of TM6 of subunit III touching ActF, forming a wedge-like space between the membrane domains of ACIII and cyt aa 3. The fall-off of resolution in the portions of cyt aa 3 that are distant from the interface with ACIII suggests that there may be multiple conformations of the supercomplex that are all tethered by the loop in cyt aa 3. The loop could, therefore, serve as a hinge, allowing the membrane domains of ACIII and cyt aa 3 to swing into contact transiently. Using the location of TM3 to TM7 of subunit III within the supercomplex as a guide allows a model of the entire cyt aa 3 to be placed within the density map for the supercomplex (Extended Data Fig. 9). In the resulting model, there is a considerable distance (56 Å) between the heme c in ActE and CuA within subunit II of cyt aa 3. Electron transfer within the supercomplex does not require the addition of exogenous cyt c, which is also the case for the bcc-cyt aa3 supercomplex from Corynebacterium glutamicum 23 . It is possible, though it seems unlikely, that there is a subset of conformations in which ActE comes close enough to cyt aa 3 for direct electron transfer. It is noteworthy that the monoheme domain of ActA has substantial sequence homology (~30% identity) with the heme c domain that is present at the C terminus of subunit II of cyt caa 3 from Thermus thermophilus. This observation suggests that the ActA monoheme domain, which we postulate to be highly mobile in the structure (see above), may be able to interact with subunit II of cyt aa 3 and shuttle electrons from the ActE monoheme domain to subunit II of cyt aa 3. Hence, electron transfer within the supercomplex may require the monoheme domain of ACIII to swing back and forth between ACIII and cyt aa 3 to shuttle electrons (Fig. 4c). Additional experimental work will be required to test this model and, indeed, to determine the physiological advantage of forming the supercomplex. METHODS Bacterial strain and growth conditions Flavobacterium johnsoniae ATCC 17061™ strain UW101 was used in this study. The strain was a kind gift from Dr. Mark McBride at the University of Wisconsin, Milwaukee. The cells were grown in casitone-yeast extract (CYE) medium at 30 °C under high aerobic conditions (500 ml cultures in 2 L flasks) 24 . Membrane preparation and protein purification Cells grown overnight were collected by centrifugation (14,000 × g for 10 min). The cell pellet from 12 L of culture (~2.5 g/L) was resuspended in ~200 ml of 20 mM Tris-HCl buffer, pH 8 (buffer A) with 5 mM MgSO4, DNAse I (Sigma) and a protease inhibitor cocktail (Sigma). This suspension was passed three times through a Microfluidizer at a pressure of 80,000 psi to disrupt the cells. The cell extract was centrifuged at 14,000 × g for 10 min to remove unbroken cells. Membranes were obtained after centrifugation at 185,500 × g for 4 h. Under the above growth conditions, the membranes contained ACIII, cyt aa3 and cyt bd. The membrane pellet was solubilized by using either a traditional detergent or the SMA copolymer. Purification using Triton X-100 and DDM The membrane pellet was resuspended in buffer A (~50 mg/ml) along with 300 mM NaCl, and solubilized by the addition of Triton X-100 (Fisher Scientific) to a final concentration of 4%. The solution was incubated at 4 °C for 2 h with mild agitation. The suspension was cleared by centrifugation at 185,500 × g for 1h, after which the detergent was diluted 4-fold by adding three volumes of buffer A to the supernatant. The diluted supernatant was then added to a chromatography column containing 10 ml of Ni-NTA resin (Qiagen) pre-equilibrated with 20 mM Tris-HCl pH 8 containing 0.05% Triton X-100 and 0.15 M NaCl (buffer B). The resin was washed with about 10 column volumes of buffer B to remove any unbound sample. Detergent exchange to DDM (Anatrace) was carried out by washing with buffer B containing 0.05% DDM instead of Triton X-100 (buffer C). The column was further washed with 5 column volumes of buffer C containing 10 mM imidazole to remove the loosely bound proteins from the resin. The proteins that were well bound to the resin were eluted using 100 mM imidazole in buffer C. The eluent was concentrated to ~3 ml using Amicon Ultra-15 filters (Millipore) with a 100 kDa cutoff. The excess imidazole was removed by dialysis against buffer C. The yield of protein obtained was about 0.3 mg/L of ACIII and 0.16 mg/L of cyt aa3 from 12 L of culture. When indicated, the proteins were further purified by gel filtration chromatography using Superdex 200 10/300 GL (GE Healthcare Life Sciences). The purified proteins were stored at −80 °C after adding glycerol to a final concentration of 10%. Purification using SMA copolymer The SMA copolymer SMA® 3000HNA (styrene maleic acid copolymer ca. 3:1 molar ratio of styrene:maleic acid) was kindly provided as a gift from Dr. Terry Bricker (Louisiana State University, Baton Rouge, LA) who used SMA copolymer made by Cray Valley USA LLC (now Total Petrochemicals & Refining USA Inc.) successfully for the studies of photosystem from spinach thylakoids 25 . Additional SMA® 3000HNA was kindly provided by Total Petrochemicals & Refining USA (Houston, TX) as an aqueous solution of 25.6% (w/v) SMA. We also used a similar product, Xiran SL25010 S25, kindly provided by Polyscope Polymers B. V., with similar results. These polymer preparations are provided as aqueous solutions of the sodium salt, and the polymer solutions were simply diluted to the final desired percentage to use directly for solubilization of membranes. The purification protocol with the SMA copolymer was similar to that described with detergents with the following differences. After the membrane pellet was resuspended, the SMA solution was added dropwise to a final concentration of 1% with continuous stirring. After incubation for 1 h at room temperature, the solution was centrifuged at 185,500 × g for 1 h to remove unsolubilized particles. The supernatant was loaded directly to the Ni-NTA column equilibrated with 20 mM Tris-HCl pH 8, 0.15 M NaCl. The remaining steps of the purification were as described above. Following solubilization of the membrane suspension with 1% SMA 3000HNA, no additional SMA or detergents were added and were not needed to maintain the solubilized proteins in solution. The yield of protein after the use of the SMA copolymer for solubilization was about 0.5 mg/L for ACIII and about 0.15 mg/L for cyt aa3 from 12 L of culture. Analytical methods The total protein concentration was determined using the BCA kit (Thermo Scientific, Pierce Protein Research Products). The UV–visible spectra of the oxidized and reduced proteins were recorded on an Agilent Technologies spectrophotometer (model 8453). The pyridine hemochrome assay 26 was used to determine the concentration of hemes present in the protein samples. The total heme c concentration was divided by seven to calculate the ACIII concentration and the total heme a concentration was divided by two to calculate the cyt aa3 concentration. The purified protein was analyzed by SDS-PAGE using 4-20% precast gels (Nusep Tech). Heme staining was carried out using 3,3′,5,5′-tetramethyl benzidene (TMBZ) 27 . The supercomplex was visualized by blue native PAGE (BN-PAGE) using a 4–16% gel (Novex, Life Technologies) with Bis-Tris buffer. The entire gel was stained with Coomassie blue, and then fixed with 30% methanol and 10% glacial acetic acid. The gel was destained with 8% glacial acetic acid to visualize the bands. Peptide mass spectrometry and analyses was carried out by Dr. Peter Yau at the Roy J. Carver Biotechnology Center (CBC) at the University of Illinois at Urbana-Champaign. Oxygen consumption assay Oxygen consumption was measured using a Clark electrode (Strathkelvin) in a 1 ml chamber at 25 °C as described in 28 . The reaction mix consisted of 100 μM ubiquinone-1 (Q1) (Sigma-Aldrich) and 5 mM dithiothreitol (DTT) in air-saturated 0.1 M potassium phosphate buffer, pH 7.5 with 150 mM NaCl. The reaction was started by adding the purified protein into the chamber. The initial concentration of oxygen was calculated to be 237 μM. Quinol:cytochrome c oxidoreductase activity The quinol:cytochrome c oxidoreductase activity of the ACIII was measured spectrophotometrically as described in 29 . The reaction was carried out in a 2 ml anaerobic cuvette, at 25 °C in 50 mM potassium phosphate buffer, pH 7.5 in the presence of 50 μM horse heart cytochrome c (Sigma-Aldrich) and 200 μM KCN. Ubiquinol-1 (Q1H2) or reduced vitamin K2 (Sigma-Aldrich) were used as quinol substrates and, in each case, the quinone was reduced using sodium borohydride according to the method by Ragan 30 . The reaction was started by the addition of 100 μM of reduced quinone. EPR spectroscopy The purified ACIII-cyt aa3 supercomplex was extensively dialyzed against 20 mM Tris-HCl buffer, pH 8, with 150 mM NaCl and 1 mM EDTA to eliminate adventitious transition metal ions. The sample was concentrated in an Amicon filter to 150 μl with a final ACIII concentration of ~60 μM. The air-oxidized sample was directly transferred to an X-band EPR tube and subsequently frozen in liquid nitrogen. The sample was oxidized completely by the addition of 2 mM potassium ferricyanide. Glycerol (5%) was present in all EPR samples. Continuous Wave (CW) EPR measurements were carried out on an X-band Varian EPR-E122 spectrometer at the Electron Paramagnetic Resonance facility at the University of Illinois at Urban-Champaign. Cryogenic conditions below 77 K were achieved with a Lakeshore 331 temperature controller using a regulated flow of helium gas. Metal Analysis Metal Analysis was carried out using inductively coupled plasma mass spectrometry (ICP-MS) as previously described 31,32 . Optical Redox Titration Full spectrum UV-visible redox titrations were performed to determine the midpoint potentials ( E m o ) of the redox-active cytochromes in the DDM-solubilized ACIII-cyt aa 3 supercomplex 33,34 . The purified supercomplex was suspended in 4 ml of 50 mM potassium phosphate buffer (pH 7.0) to a concentration of 3 μM with 25 μM each of following redox mediators; benzyl viologen (Em,7 = −350 mV), anthraquinone-2-sulfonate (Em,7 = −225 mV), 2-hydroxy-1,4-naphthoquinone (Em,7 = −220 mV), 9,10-anthroquinone-2,6-disulfonate (Em,7 = −185 mV), duroquinone (Em,7 = 5 mV), N-ethylphenazonium ethosulfate (Em,7 = 65 mV), N-methylphenazonium methosulfate (Em,7 = 85 mV), diaminodurene (Em,7 = 275 mV), 2,6-dimethyl benzoquinone (Em,7 = 180 mV), 1,2-napthoquinone (Em,7 =143 mV), 1,4-napthoquinone (Em,7 = 36 mV) and potassium ferricyanide (Em,7 = 435 mV) 35 .Titrations were performed with an anaerobic stirred cuvette and the solution potential was adjusted by injecting aliquots of 10 mM sodium dithionite or potassium ferricyanide as reductant and oxidant, respectively. Spectra were taken at approximately 10–20 mV increments over the titration range indicated. Spectroscopic changes of the α-bands of the hemes upon reduction or oxidation were monitored at the peak maxima to determine the midpoint potentials of each class of heme center. The data sets were analyzed using Origin™ (Origin Lab Corporation) to determine spectral components and fit titration curves using the Nernst equation 35 . EM sample preparation Holey carbon film-coated EM grids were nanofabricated with regular arrays of 500 to 800 nm holes 36 and coated with an additional layer of gold. Cryo-EM specimens were prepared with a FEI Vitrobot grid preparation robot at 4 °C and 100 % humidity by applying 3 μl of sample (3 mg/ml) to glow-discharged grids, allowing the grids to equilibrate for 1 s, and blotting for 12 s before freezing in a liquid ethane/propane mixture (1:1 v/v) 37 . Grids were subsequently stored in liquid nitrogen before shipping to the New York Structural Biology Center for imaging with a FEI Titan Krios electron microscope equipped with a Gatan K2 Summit camera and automated with Leginon 38 . EM data acquisition Movies were acquired in electron counting mode with a pixel size of 1.1 Å, an exposure rate of 7.4 electrons/pixel/s, and a total exposure time of 10 s divided in 40 frames (418 movies) or 50 frames (1599 movies). Frame alignment and exposure weighting were performed with Motioncor2 39 . After screening averages from the aligned movies, 475 movies were discarded because of excessive movement, low defocus, high defocus, or over-focus. CTF parameters were estimated from the exposure-weighted averages of movie frames with CTFFIND4 40 . Image processing 3044 particle images were manually selected and subjected to 2D classification with Relion 1.4 41 . The resulting 2D classes were used as templates for automatic selection of 899405 particle images 42 . The number of particle images was reduced to 693416 by further 2D classification. Subsequent image processing was carried out in cryoSPARC 43 . An initial map of ACIII-cyt aa 3 was obtained by ab initio 3D classification, refined to 4.1 Å resolution, and used as a reference for the multi-refine procedure in cryoSPARC producing initial maps of the ACIII and ACIII-cyt aa3 structures. 164239 particle images were used to refine the ACIII-cyt aa 3 map to 3.4 Å resolution, but this map showed the cyt aa 3 portion of the complex with lower-density than the ACIII part. Maps with uniform density for ACIII-cyt aa 3 and ACIII, both at 3.6 Å resolution, were calculated from 81530 and 51547 particle images, respectively. Model building The 3.4 Å resolution density map was used for the de novo model building of ACIII. The density map was first segmented with UCSF Chimera 44 to facilitate the identification of subunits. The connectivity of each segmented map was further examined and the result was compared with topology predictions from topocons 45 and secondary structure prediction from Jpred 46 to validate the subunit assignment and identify the directionality of peptide chain. With this information, model building was carried out manually in Coot 47 . Individual chains were first traced in Cα baton mode. Readily interpretable features from the density map, including regions rich in bulky residues, triacylated cysteines, and axial ligands of heme c, were used to register the structure to the sequence. Stretches of ~20 amino acids were built progressively around these registration points and assembled as a single chain in Coot. All six subunits of ACIII were combined and refined with phenix.real_space_refine 48 . For cofactors, the starting models were taken from CCP4 ligand library directly. Cofactors were docked to the density map with Coot and merged with the apo protein structure. The complete structure was then refined with phenix.real_space_refine with geometric constraints for the protein-cofactor coordination. The final model was further examined in Coot to remove amino acid sidechains with ambiguous orientations and further validated with MolProbity 49 and EMringer 50 . All identified lipids with two acyl tails were modeled as phosphatidylethanolamine (PE) with palmitoyl tail. The conformation of PE was refined with interactive Molecular Dynamics Flexible Fitting (iMDFF) in the presence of protein structure using VMD 51 . Lipid tails were then truncated according to the density map. The 3.6 Å resolution ACIII-cyt aa 3 density map was used for the model building of cyt aa 3. Part of the subunit III loop region was manually built in Coot. Homology models for individual subunits were generated with the RaptorX server 52 and docked into the density map with UCSF Chimera. The model for ACIII-cyt aa 3 supercomplex was assembled by fitting the ACIII structure to the ACIII-cyt aa 3 map and placing the cyt aa 3 structure from Rb. sphaeroides (PDB 1M56) into the map based on the position of cyt aa 3 subunit III. Bioinformatic analysis Homologous protein sequences were retrieved using the NCBI blastp server 53 . The blastp results were analyzed in with pandas and biopython modules. Sequence hits were filtered based on coverage and sequence identity. Representative sequences were selected based on sequence identity to maintain the variations in sequence and aligned using the Clustal omega server 54 . Figures for sequence alignment were prepared using the ESPript 3.0 server 55 . Simulation system preparation The initial ACIII structure for the MD simulation was obtained from the refined structure determined by cryo-EM. Eleven 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphoethanolamine (POPE) lipids resolved by cryo-EM were added to the ACIII system, which was subsequently embedded in a POPE membrane bilayer, solvated with the TIP3P water model 56 , and ionized with 150 mM NaCl. ReMDFF simulation Resolution-exchange Molecular Dynamics Flexible Fitting (ReMDFF) 57 was used for structure refinement, with the CHARMM 36m force field for proteins 58 and CHARMM 36 force field for lipids 59 . Force field parameters for hemes and iron sulfur cluster came from previous studies 60–63 . The fitting was performed in vacuum in the presence of a grid potential derived from the experimental density map (coupling factor 0.3). Secondary structure restraints, cis-peptide bond restraints, and chirality restraints were applied to the protein. Hemes and iron-sulfur clusters were harmonically restrained (k = 50 kcal/mol/Å2). A Langevin thermostat 64 was used for maintaining the average temperature at 80 K. The MD integration time step was 1 fs. A cutoff radius for nonbonded interactions was set to 10 Å with a switching function taking effect at 9 Å. A total of six replicas were used together with six grid potentials of decreasing resolutions. Each was first energy-minimized for 2000 steps and then equilibrated for 1 ps. Finally, 2000 replica exchanges were attempted with 1 ps between attempts. MD simulation The ACIII systems were simulated with NAMD 2.12 using the same force field parameters as in ReMDFF. The system was energy-minimized for 3000 steps using the conjugated gradient algorithm 65 with linear searching 66 , and equilibrated for 0.5 ns to relax lipid tail group atoms while keeping the lipid phosphorus atoms and protein (including hemes and iron-sulfur clusters) heavy atoms harmonically restrained (k = 1 kcal/mol/Å2). This procedure was followed by a 10-ns simulation to allow lipids to relax around the proteins while keeping the protein backbone and heavy atoms from iron-sulfur clusters and hemes harmonically restrained (k =1 kcal/mol/Å2). Restraints were gradually released over the next 5 ns and the simulation continued without any biasing potential for a total of 250 ns. The angles in the iron-sulfur clusters were harmonically restrained to their initial values (k = 300 kcal/mol/deg) throughout the simulation. Data availability All relevant data are available from the corresponding authors upon reasonable request and/or included in the manuscript or Supplementary Information. Three cryo-EM maps mentioned in this work have been deposited in the Electron Microscopy Data Bank (EMDB) under accession codes EMD-7286 (Combined), EMD-7447 (ACIII-cyt aa 3), EMD-7448 (ACIII). The coordinates of the atomic model of the Alternative Complex III built from EMD-7286 have been deposited in the Protein Data Bank (PDB) under accession code 6BTM. Extended Data Extended Data Figure 1 Expression and spectroscopic characterization of the ACIII/cyt aa 3 supercomplex a, A schematic representation of the respiratory chain of F. johnsoniae. b, UV/visible spectroscopy and SDS-PAGE with the membranes from F. johnsoniae. On the left is the difference spectrum of the membranes of F. johnsoniae was obtained from the spectrum of the air-oxidized membranes and the spectrum after reduction with dithionite. The wavelengths associated with the heme peaks are 605 nm, 560 nm, 552 nm and a broad peak at 630 nm for hemes a, b, c and d respectively. On the right is the SDS-PAGE with the membranes followed by staining the gel for hemes shows bands corresponding to the cytochrome subunits ActA (48kDa) and ActE (20kDa) of ACIII but nothing corresponding to the cytochrome subunit (~ 35kDa) from the cbb3 oxidase. c, The gene arrangement for the ACIII and the cytochrome oxidase aa3 genes in the F. johnsoniae genome. The genes for the subunits I and II from cyt aa3 oxidase are found immediately downstream of those for the act genes of the ACIII. Two different versions of subunit III are denoted as vI and vII. d, Reduced and oxidized UV/visible spectra of the supercomplex in detergent and SMA nanodiscs. The dithionite reduced form of the samples is represented in red and shows the peaks for heme c at 524 nm and 552 nm and those for heme a at 443 nm and 605 nm. e, Pyridine hemochrome assay of the ACIII:aa 3 supercomplex in SMA nanodiscs. Plotted is the reduced-minus-oxidized difference spectrum of the pyridine hemochromes of the sample. Peaks at 520 nm and 550 nm are associated with heme c and peak at 590 nm is associated with heme a. Quantitation from the spectrum shows a ratio of 10.6:1 between heme c and heme a, which translates into a 3:2 ratio between ACIII and aa 3 assuming 7 heme c per ACIII and 2 heme a per aa 3. Data in b are representative of two independent experiments with similar results and data in d,e are representative of six independent experiments with similar results. Extended Data Figure 2 Component and size analysis of the ACIII/cyt aa 3 supercomplex a, SDS-PAGE of the detergent solubilized preparation followed by Coomassie staining (Left) and heme staining (Right). b, SDS-PAGE of the SMA nanodiscs preparation followed by Coomassie staining (Left) and heme staining (Right). c, Mass spectrometry results for the ACIII-aa3 supercomplex preparations. d, Size exclusion chromatography with the ACIII-aa3 supercomplex from F.johnsoniae. (Top left) The detergent-solubilized sample showing traces for protein at 280 nm, heme c at 412 nm and heme a at 443 nm respectively. (Top right) The sample isolated using the SMA copolymer showing traces for protein at 280 nm, heme c at 410 nm and heme a at 605 nm. I and II are the two peaks corresponding to two populations of the supercomplex. (Bottom left) The fraction containing peak I. (Bottom right) The fraction containing peak II. e, BN-PAGE with the ACIII:aa3 supercomplex. Detergent solubilized ACIII-aa3 supercomplex showing a band at around 500 kDa, a smear of possible aggregates and possibly ACIII by itself. The supercomplex in SMA nanodiscs shows two different populations. f, BN-PAGE with the two different populations of ACIII:aa3 supercomplex in SMA nanodiscs purified from the size exclusion chromatography. The two chromatography peaks correspond to the two bands observed in the BN-PAGE. Data in a,b are representative of six independent experiments and those in d,e,f are representative of three independent experiments with similar results. Extended Data Figure 3 Functional assays of the ACIII/cyt aa 3 supercomplex a, The EPR spectrum of the air oxidized sample showing peaks of the [3Fe-4S]1+ cluster from ACIII, the CuA from the aa3 oxidase and low-spin hemes with overlapping g values. Insert is a zoomed view from 3000 G to 3500 G to better visualize the peaks from CuA (black arrows) and [3Fe-4S]1+ cluster. The region between 4000 G and 5000 G is magnified 10-times to show the broad gx trough of low-spin hemes. The measurement condition is 10 K, 9.267 GHz, 2 mW microwave power and 20 Gauss modulation. b, The EPR spectra of the ferricyanide oxidized sample at various temperatures. The measurement condition is 9.257 GHz, 2 mW microwave power and 5 Gauss modulation. c, The EPR spectrum of the air oxidized sample showing peaks of iron sulfur clusters from ACIII and low spin hemes. The measurement condition is 10 K, 9.427 GHz, 2 mW microwave power, 10 Gauss modulation. d, The EPR spectra of the air oxidized sample at various temperatures. The measurement condition is 9.427 GHz, 2 mW microwave power, 5 Gauss modulation. e, Redox titration of the hemes in the ACIII and the aa3 oxidase in supercomplex in DDM. The potentiometric titration of the c hemes (Top) from the ACIII and the a hemes (Bottom) from the aa3 oxidase. The Em values are indicated and the solid red line represents the Nernst fitting. f, Steady state activity of preparations of the ACIII/cyt aa 3 preparations (The number of independent experiments, n = 6 for ACIII in DDM and SMA nanodiscs and n = 3 for Peak I and Peak II). Data are means ± standard deviation (S.D.). Data in a-e are representative of three independent experiments with similar results. Extended Data Figure 4 single-particle cryo-EM of ACIII-cyt aa 3 supercomplex in SMA nanodiscs a, Sum of an aligned movie of ACIII-cyt aa 3 supercomplex in an SMA nanodisc. The scale bar is 20 nm. b, Two-dimensional class averages with the scale bar representing 10 nm. c, Fourier shell coefficient (FSC) curves between two independently refined half-maps for the ACIII-cyt aa 3 map, ACIII map and combined map. d, Surface rendering maps colored according to local resolution, the scale bar is 5 nm. e, Euler angle distributions of particles included in the calculation of the three final maps. Data collection and structure calculation were not repeated. Extended Data Figure 5 Features observed in the cryo-EM density and the de novo structure of ACIII a, Surface representations of ACIII, cyt aa 3 and the ACIII-cyt aa 3 supercomplex. The density threshold is the same for ACIII and cyt aa 3. b, Different views of the ACIII density, colored by subunit. c, Two single-span transmembrane peptides of unknown origin and sequence, denoted ActX and ActY, are present in the structure in the vicinity of ActC. These have each been modeled as a polyalanine peptide. d, α-helices 2–10 of ActC form two four-helical up-and-down bundles, colored in two different blue colors. α-helices 1 and 10 are colored gray and unlabeled. e, ActB, shown in cartoon, has contact with ActA, ActC, ActD, ActE and ActF. Surfaces are drawn from residues that are within 4 Å of ActB and colored according to their chain. f, The transmembrane α-helices of ActC and ActF are arranged in a pseudo two-fold rotation symmetry. g, Side-by-side comparison of the polysulfide reductase (PDB 2VPZ) and the assembly of ActB and ActC. These two structures are aligned based on PsrB, the domain containing four Fe-S clusters. Extended Data Figure 6 quinone pocket in ActC a, sequence alignment of the ActC from Flavobacterium johnsoniae, Rhodothermus marinus, and Chloroflexus aurantiacus. The transmembrane α-helices are labelled based on the structure of ACIII from Flavobacterium johnsoniae. The black arrows point to conserved polar residues that are within 15 Å of the [3Fe-4S] cluster in ActB. b, Proposed quinone pocket based on the arrangement of conserved polar residues. c, Different views of the proposed quinone pocket with a docked menaquinone-1 molecule. Hydrophobic residues near the menaquinone-1 (MK-1) head group are also shown. The crevice between α-helix 3 and α-helix 4 is a putative quinone entry pathway. Extended Data Figure 7 Fitting of ACIII structure to cryo-EM density a, Fitting of cofactors into the cryo-EM density. The blue mesh is drawn with a higher density threshold to reveal metal centers. The numberings of nearby amino acid residues, which are shown along with these cofactors, are listed below each cofactor. b, fitting of different secondary structure elements to cryo-EM density. c, Eleven identified lipids are modelled as phosphatidylethanolamine molecules. d, The triacylated cysteine at the N terminus of ActE shown along with 15 downstream amino acids. Notably, residue Tyr28E is in contact with the covalent lipid of ActE. Attachment of ActE to the membrane may also be assisted by aromatic residues Tyr30E and Phe31E, which appear to be inserted into the lipid bilayer. Throughout the MD simulation trajectory, these residues remain buried in the lipid bilayer. Extended Data Figure 8 Protein stability and lipid-protein interaction analysis based on molecular dynamics simulations a, Root-mean-square deviation (RMSD) of the protein backbone heavy atoms for the entire ACIII complex and each subunit, aligned based on ACIII backbone heavy atoms from three independent molecular dynamics simulations. b, Same as a, but aligned using the backbone heavy atoms of each subunits. c, Superposition of the initial (black) and final conformation (colored) of each subunit after 250 ns of simulation (aligned using backbone heavy atoms). d, The lipid-protein contact number defined by the number of lipid atoms within 4 Å of the protein atoms calculated over the time course of the simulation. This contact number is either calculated for the eleven lipids resolved by cryo-EM (top) or all membrane lipids (bottom). e, The lipid-protein contact number for each of the eleven cryo-EM resolved lipids. f, Isosurfaces (50%) of the atom occupancy map for the lipid anchors (orange), cryo-EM resolved lipids (red), and other membrane lipids (purple), calculated using the last 230 ns simulation trajectory. The stronger the lipid-protein interactions, the longer the local residence time, which leads to higher atom-occupancy values. ACIII subunits C, D, and F are shown in silver. For all plots, the raw data are shown as translucent thin lines and the block-averages are shown as dark lines. Extended Data Figure 9 Structural basis for the supercomplex formation between the ACIII and the cyt aa 3 a, Two contact areas between the ACIII and the cyt aa 3: the transmembrane portion of subunit III (red) and the loop from subunit III (orange). A homology model of subunit III fits the transmembrane density. The loop is modeled to the cryo-EM density. The sequence of the peptide is also shown. Tryptophan 188 and phenylalanine 189 are used to register the density to the sequence. b, Model of the ACIII-cyt aa 3 supercomplex. The cyt aa 3 structure from Rb. sphaeroides was positioned based on the transmembrane portion of subunit III. α-helices 1 and 2 of subunit III are missing in F. johnsoniae to avoid steric clashes with the ACIII structure. c, Sequence alignment of subunit III with a long loop (highlighted with the orange bar) between α-helix 5 and α-helix 6 (numbered according to subunit III from Rb. sphaeroides). Tryptophan 188 (black arrow) is conserved. d, Sequence alignment of ActB from organisms with a long loop in their subunit III of cyt aa 3 oxidase. Arginine 868 (red arrow) is largely conserved with occasional substitution to lysine. Extended Data Table 1 Cryo-EM data collection, refinement and validation statistics Combined (EMDB-7286) (PDB 6BTM) ACIII-cyt aa3 (EMDB-7447) ACIII (EMDB-7448) Data collection and processing Magnification 75,000× 75,000× 75,000× Voltage (kV) 300 300 300 Electron exposure (e−/Å2) 61 61 61 Defocus range (μm) 0.8–5.0 0.8–5.0 0.8–5.0 Pixel size (Å) 1.1 1.1 1.1 Symmetry imposed C1 C1 C1 Initial particle images (no.) 899,405 899,405 899,405 Final particle images (no.) 164,239 81,530 51,547 Map resolution (Å) 3.4* 3.6* 3.6*  FSC threshold 0.143 0.143 0.143 Map resolution range (Å) 3.0–4.5 3.0–6.0 3.0–5.0 Refinement Initial model used (PDB code) N/A N/A N/A Model resolution (Å) 3.7† N/A N/A  FSC threshold 0.5 Model resolution range (Å) 3.7 N/A N/A Map sharpening B factor (Å2) −150.1 −132.4 −129.1 Model composition  Non-hydrogen atoms 18,935  Protein residues 2,361  Ligands 10 B factors (Å2)  Protein 126.4‡  Ligand 119.7‡ R.M.S. deviations  Bond lengths (Å) 0.009  Bond angles (°) 1.26 Validation  MolProbity score 1.36  Clashscore 1.58  Poor rotamers (%) 0.15 Ramachandran plot  Favored (%) 92.7  Allowed (%) 7.3  Disallowed (%) 0.0 * Determined with cryoSPARC † Determined with Phenix.mtriage ‡ Mean value of the B factors determined with Phenix.real_space_refine Supplementary Material Supplementary Discussion

          Related collections

          Most cited references49

          • Record: found
          • Abstract: not found
          • Article: not found

          Function minimization by conjugate gradients

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A Bayesian View on Cryo-EM Structure Determination

            Introduction With recent reports on near-atomic-resolution (i.e., 3–4 Å) structures for several icosahedral viruses and resolutions in the range of 4–6 Å for complexes with less or no symmetry, cryo-electron microscopy (cryo-EM) single-particle analysis has entered the exciting stage where it may be used for de novo generation of atomic models. 1 However, the observation that reported resolutions vary significantly for maps with otherwise similar features 2 is an indication that existing reconstruction methods suffer from different degrees of overfitting. Overfitting occurs when the reconstruction describes noise instead of the underlying signal in the data, and often, these noisy features are enhanced during iterative refinement procedures. Thereby, overfitting is not merely an issue of comparing the resolution of one reconstruction with another but represents a major obstacle in the objective analysis of cryo-EM maps. In particular, without a useful cross-validation tool, such as the free R-factor in X-ray crystallography, 3 overfitting may remain undetected and a map may be interpreted at a resolution where the features are mainly due to noise. At the heart of the problem lies the indirectness of the experimental observations. A reasonably good model is available for the image formation process. Given a three-dimensional (3D) structure, this so-called forward model describes the appearance of the experimental images. However, the problem of single-particle reconstruction is the inverse one and is much more difficult to solve. The structure determination task is further complicated by the lack of information about the relative orientations of all particles and, in the case of structural variability in the sample, also their assignment to a structurally unique class. These data are lost during the experiment, where molecules in distinct conformations coexist in solution and adopt random orientations in the ice. In mathematics, this type of problem where part of the data is missing is called incomplete. Moreover, because the electron exposure of the sample needs to be strictly limited to prevent radiation damage, experimental cryo-EM images are extremely noisy. The high levels of noise together with the incompleteness of the data mean that cryo-EM structures are not fully determined by the experimental data and therefore prone to overfitting. In mathematical terms, the cryo-EM structure determination problem is ill-posed. Ill-posed problems can be tackled by regularization, where the experimental data are complemented with external or prior information so that the two sources of information together fully determine a unique solution. A particularly powerful source of prior information about cryo-EM reconstructions is smoothness. Because macromolecules consist of atoms that are connected through chemical bonds, the scattering potential will vary smoothly in space, especially at less than atomic resolution. The concept of imposing smoothness to prevent overfitting is widely used in the field through a variety of ad hoc filtering procedures. By limiting the power of the reconstruction at those frequencies where the signal-to-noise ratio (SNR) is low, these filters impose smoothness on the reconstructed density in real space. Traditionally, filtering procedures have relied on heuristics, that is, to some extent, existing implementations are all based on arbitrary decisions. Although potentially highly effective (and this is illustrated by the high-resolution structures mentioned above), the heuristics in these methods often involve the tuning of free parameters, such as low-pass filter shape and effective resolution (e.g., see Ref. 4). Thereby, the user (or, in some cases, the programmer) becomes responsible for the delicate balance between getting the most out of the data and limiting overfitting, which ultimately may lead to subjectivity in the structure determination process. Recent attention for statistical image processing methods 5 could be explained by a general interest in reducing the amount of heuristics in cryo-EM reconstruction procedures. Rather than combining separate steps of particle alignment, class averaging, filtering, and 3D reconstruction, each of which may involve arbitrary decisions, the statistical approach seeks to maximize a single probability function. Most of the statistical methods presented thus far have optimized a likelihood function, that is, one aims to find the model that has the highest probability of being the correct one in the light of the observed data. This has important theoretical advantages, as the maximum likelihood (ML) estimate is asymptotically unbiased and efficient. That is, in the limit of very large data sets, the ML estimate is as good as or better than any other estimate of the true model (see Ref. 6 for a recent review on ML methods in cryo-EM). In practice, however, data sets are not very large, and also in the statistical approach, the experimental data may need to be supplemented with prior information in order to define a unique solution. In Bayesian statistics, regularization is interpreted as imposing prior distributions on model parameters, and the ML optimization target may be augmented with such prior distributions. Optimization of the resulting posterior distribution is called regularized likelihood optimization, or maximum a posteriori (MAP) estimation (see Ref. 7). In this paper, I will show that MAP estimation provides a self-contained statistical framework in which the regularized single-particle reconstruction problem can be solved with only a minimal amount of heuristics. As a prior, I will use a Gaussian distribution on the Fourier components of the signal. Neither the use of this prior nor that of the Bayesian treatment of cryo-EM data is a new idea. Standard textbooks on statistical inference use the same prior in a Bayesian interpretation of the commonly used Wiener filter (e.g., see Ref. 7, pp. 549–551), and an early mention of MAP estimation with a Gaussian prior in the context of 3D EM image restoration was given by Carazo. 8 Nevertheless, even though these ideas have been around for many years, the Bayesian approach has thus far not found wide-spread use in 3D EM structure determination (see Ref. 9 for a recent application). This limited use contrasts with other methods in structural biology. Recently, Bayesian inference was shown to be highly effective in NMR structure determination, 10 while the Bayesian approach was introduced to the field of X-ray crystallography many years ago 11 and MAP estimation is now routinely used in crystallographic refinement. 12 In what follows, I will first describe some of the underlying theory of existing cryo-EM structure determination procedures to provide a context for the statistical approach. Then, I will derive an iterative MAP estimation algorithm that employs a Gaussian prior on the model in Fourier space. Because statistical assumptions about the signal and the noise are made explicit in the target function, straightforward calculus in the optimization of this target leads to valuable new insights into the optimal linear (or Wiener) filter in the context of 3D reconstruction and the definition of the 3D SNR in the Fourier transform of the reconstruction. Moreover, because the MAP algorithm requires only a minimum amount of heuristics, arbitrary decisions by the user or the programmer may be largely avoided, and objectivity may be preserved. I will demonstrate the effectiveness of the statistical approach by application to three cryo-EM data sets and compare the results with those obtained using conventional methods. Apart from overall improvements in the reconstructed maps and the ability to detect smaller classes in structurally heterogeneous data sets, the statistical approach reduces overfitting and provides reconstructions with more reliable resolution estimates. Theory Conventional methods Many different procedures have been implemented to determine 3D structures from cryo-EM projection data. The following does not seek to describe all of them but, rather, aims to provide an accessible introduction to the Bayesian approach described below. For an extensive review of existing cryo-EM methods, the reader is referred to the book by Frank 4 or to the more recent volumes 481–483 of the book series Methods in Enzymology. 13 Almost all existing implementations for cryo-EM structure determination employ the so-called weak-phase object approximation, which leads to a linear image formation model in Fourier space: (1) X i j = CTF i j ∑ l = 1 L P j l ϕ V l + N i j where: • X ij is the jth component, with j = 1,…,J, of the two-dimensional (2D) Fourier transform of the ith experimental image X i , with i = 1,…,N. • CTF ij is the jth component of the contrast transfer function for the ith image. Some implementations, such as EMAN, 14 include an envelope function on the contrast transfer function (CTF) that describes the fall-off of signal with resolution. Other implementations, such as FREALIGN, 15 ignore envelope functions at this stage and correct for signal fall-off through B-factor sharpening of the map after refinement. 16 The latter intrinsically assumes identical CTF envelopes for all images. • V l is the lth component, with l = 1,…,L, of the 3D Fourier transform V of the underlying structure in the data set. Estimating V is the objective of the structure determination process. For the sake of simplicity, only the structurally homogeneous case is described here. Nevertheless, Eq. (1) may be expanded to describe structural heterogeneity, that is, data sets that contain more than one underlying 3D structure, by adding a subscript: V k , with k = 1,…,K. Often, K is assumed to be known, 17 so that each experimental image can be described as a projection of one of K different structures, each of which needs to be estimated from the data. • P ϕ is a J  × L matrix of elements P jl ϕ. The operation ∑ l = 1 L P j l ϕ V l for all j extracts a slice out of the 3D Fourier transform of the underlying structure, and Φ defines the orientation of the 2D Fourier transform with respect to the 3D structure, comprising a 3D rotation and a phase shift accounting for a 2D origin offset in the experimental image. Similarly, the operation ∑ j = 1 J P ϕ l j T X i j for all l places the 2D Fourier transform of an experimental image back into the 3D transform. According to the projection-slice theorem, these operations are equivalent to the real-space projection and “back-projection” operations. Some implementations calculate (back)-projections in real space, such as XMIPP; 18 other implementations, such as FREALIGN, 15 perform these calculations in Fourier space. • N ij is noise in the complex plane. Although explicit assumptions about the statistical characteristics of the noise are not often reported, commonly employed Wiener filters and cross-correlation goodness-of-fit measures rely on the assumption that the noise is independent and Gaussian distributed. After selection of the individual particles from the digitized micrographs, the experimental observations comprise N images X i . From the micrographs, one may also calculate the CTFs, which are then kept constant in most procedures. The estimation of V from all X i and CTF i is then typically accomplished by an iterative procedure (called refinement) that requires an initial, often low-resolution, 3D reference structure V (0). As this paper is primarily concerned with refinement, the reader is referred to the books mentioned above for more information about how these starting models may be obtained. At every iteration (n) of the refinement process, projections of V (n) are calculated for many different orientations ϕ and compared with each of the experimental images. Based on some goodness-of-fit measure, an optimal orientation ϕ i ⁎ is assigned to each image. All images are then combined into a 3D reconstruction that yields the updated model V (n + 1). Many different reconstruction algorithms are available, but their description falls outside the scope of this paper (again, the reader is referred to the books mentioned above). In what follows, I will focus on a class of algorithms that has been termed direct Fourier inversion and will mostly ignore complications due to interpolations and nonuniform sampling of Fourier space. The update formula for V may then be given by (for all l): (2) V l ( n +1 ) = ∑ i = 1 N ∑ j = 1 J P ϕ i ∗ l j T CTF i j X i j ∑ i = 1 N ∑ j = 1 J P ϕ i ∗ l j T CTF i j 2 and this procedure is typically repeated until changes in V and/or ϕ i ⁎ become small. It is important to realize that this refinement is a local optimization procedure that is prone to becoming stuck in local minima (and the same is true for the statistical approach outlined below). Consequently, the initial reference structure V (0) may have an important effect on the outcome of the refinement, as wrong initial models could lead to incorrect solutions. Still, if one ignores local minima and if the goodness-of-fit measure used in the assignment of all ϕ i ⁎ is a least-squares or cross-correlation criterion, then one could argue that this procedure provides a least-squares estimate of the true 3D structure. However, as explained in Introduction, the observed data alone are not sufficient to uniquely determine the correct solution. Consequently, without the inclusion of additional, prior information V may become very noisy, especially at frequencies where many CTFs have zero or small values and at high frequencies where SNRs are lowest. Many existing implementations reduce the noise levels in V by means of a so-called Wiener filter. This image restoration method is based on minimization of the mean-square error between the estimate and the true signal and effectively regularizes the ill-posed problem by introducing prior knowledge about the correlation structure of the signal and the noise. 8 Most often, Wiener filter expressions are given for the case of 2D averaging, as relatively little work is published on the Wiener filter for 3D reconstruction. 19 If one assumes that both the signal and the noise are independent and Gaussian distributed with power spectra τ2(υ) for the signal and power spectra σ i 2(υ) for the noise, with v being the frequency, then (variants of) the following expression for the Wiener filter for 2D averaging are often reported: 20 (3) A j = ∑ i = 1 N τ 2 ( υ ) σ 2 ( υ ) CTF i j X i j ∑ i = 1 N τ 2 ( υ ) σ 2 ( υ ) CTF i j 2 + 1 where A j is the jth component of the 2D Fourier transform of average image A. The addition of one in the denominator of Eq. (3) reduces noise by reducing the power in the average for those Fourier components where ∑ i = 1 N τ 2 ( υ ) σ 2 ( υ ) CTF i j 2 is small. One could discern two effects of the Wiener filter, the first of which is recognized much more often than the second. (i) The Wiener filter corrects for the CTF, that is, A will represent the original signal, unaffected by the CTF. (ii) The Wiener filter also acts as a low-pass filter. If one ignores the CTF in the Wiener filter expression by setting all CTF ij in Eq. (3) equal to 1, then a filter remains that solely depends on the resolution-dependent SNR ( τ 2 ( υ ) σ 2 ( υ ) ) . Since the SNR in cryo-EM images of macromolecular images typically drops quickly with resolution (e.g., see Fig. 3a), this will effectively be a low-pass filter. In the case of 3D reconstruction, consensus about the Wiener filter has not yet been reached, and existing implementations have worked around this problem by employing a variety of ad hoc procedures. 19 Two common approximations are to apply Wiener filtering to 2D (class) averages and/or to assume that τ 2 ( υ ) σ 2 ( υ ) is a constant, the so-called Wiener constant. Examples of these two approximations may be found in EMAN 14 and FREALIGN, 15 respectively. If one assumes that the SNR is a constant 1/C, then 3D reconstruction with Wiener filtering has been expressed as (e.g., see Ref. 15): (4) V l ( n +1 ) = ∑ i = 1 N ∑ j = 1 J P ϕ i ∗ l j T CTF i j X i j ∑ i = 1 N ∑ j = 1 J P ϕ i ∗ l j T CTF i j 2 + C In many software packages, the heuristics in the Wiener filter implementation have resulted in additional free parameters, such as the Wiener constant (C). Moreover, as existing implementations typically fail to adequately reproduce the low-pass filtering effect of the true Wiener filter, it is common practice to apply ad hoc low-pass filters to V during the iterative refinement. This typically involves the tuning of even more parameters, such as effective resolution and filter shape. Suboptimal use of these arbitrary parameters may lead to the accumulation of noise in the reconstructed density and overfitting of the data. Consequently, a certain level of expertise is typically required to obtain the optimal estimate of V, which may ultimately lead to subjectivity in the cryo-EM structure determination process. A Bayesian view The statistical approach explicitly optimizes a single target function. Imagining an ensemble of possible solutions, the reconstruction problem is formulated as finding the model with parameter set Θ that has the highest probability of being the correct one in the light of both the observed data X and the prior information Y. According to Bayes' law, this so-called posterior distribution factorizes into two components: (5) P(Θ|X, Y) ∝ P(X|Θ, Y)P(Θ|Y) P ( Θ | X , Y ) ∝ P ( X | Θ , Y ) P ( Θ | Y ) where the likelihood P(X|Θ,Y) quantifies the probability of observing the data given the model, and the prior P(Θ|Y) expresses how likely that model is given the prior information. The model Θ̂ that optimizes P(Θ|X,Y) is called the MAP estimate. [Note that previously discussed ML methods optimize P(X|Θ,Y).] The statistical approach employs the same image formation model as described in Eq. (1) but explicitly assumes that all noise components N ij are independent and Gaussian distributed. The variance σ ij 2 of these noise components is unknown and will be estimated from the data. Variation of σ ij 2 with resolution allows the description of nonwhite or colored noise. The assumption of independence in the noise allows the probability of observing an image given its orientation and the model to be calculated as a multiplication of Gaussians over all its Fourier components, 21 so that: (6) P ( X i | ϕ , Θ , Y ) = ∏ j = 1 J 1 2 π σ i j 2 exp ( | X i j − CTF i j ∑ l = 1 L P j l ϕ V l | 2 − 2 σ i j 2 ) The correct orientations ϕ for all images are not known. They are treated as hidden variables and are integrated out. The corresponding marginal likelihood function of observing the entire data set X is then given by: (7) P ( X | Θ , Y ) = ∏ i = 1 N ∫ ϕ P ( X i | ϕ , Θ , Y ) P ( ϕ | Θ , Y ) d ϕ where P(ϕ|Θ,Y) expresses prior information about the distribution of the orientations. These distributions may include Gaussian distributions on the origin offsets (e.g., see Ref. 6) but their exact expression and the corresponding parameters will be ignored in what follows. Calculation of the prior relies on the assumption of smoothness in the reconstruction. Smoothness is encoded in the assumption that all Fourier components V l are independent and Gaussian distributed with zero mean and unknown variance τ l 2, so that: (8) P ( Θ | Y ) = ∏ l = 1 L 1 2 π τ l 2 exp ( | V l | 2 − 2 τ l 2 ) The assumption of zero-mean Fourier components of the underlying 3D structures may seem surprising at first. However, given that Fourier components may point in any (positive or negative) direction in the complex plane, their expected value in the absence of experimental data will indeed be zero. The regularizing behavior of this prior is actually through its scale parameter τ l 2. By imposing small values of τ l 2 on high-frequency components of V, one effectively limits the power of the signal at those frequencies, which acts like a low-pass filter in removing high-frequency noise, and thus imposes smoothness. Note that the explicit assumptions of independent, zero-mean Gaussian distributions for both the signal and the noise in the statistical approach are the same ones that underlie the derivation of the Wiener filter described above. Eqs. (6–8) together define the posterior distribution as given in Eq. (5). For a given set of images X i and their CTFs, one aims to find the best values for all V l , τ l 2, and σ ij 2. Optimization by expectation maximization 22 yields the following algorithm (also see Fig. 1): (9) V l ( n +1 ) = ∑ i = 1 N ∫ ϕ Γ i ϕ ( n ) ∑ j = 1 J P ϕ l j T CTF i j X i j σ i j 2 ( n ) d ϕ ∑ i = 1 N ∫ ϕ Γ i ϕ ( n ) ∑ j = 1 J P ϕ l j T CTF i j 2 σ i j 2 ( n ) d ϕ + 1 τ l 2 ( n ) (10) σ i j 2 ( n +1 ) = 1 2 ∫ ϕ Γ i ϕ ( n ) | X i j − CTF i j ∑ l = 1 L P j l ϕ V l ( n ) | 2 d ϕ (11) τ l 2 ( n + 1 ) = 1 2 | V l ( n + 1 ) | 2 where Γ i ϕ (n) is the posterior probability of ϕ for the ith image, given the model at iteration number (n), which is calculated as: (12) Γ i ϕ ( n ) = P ( X i | ϕ , Θ ( n ) , Y ) P ( ϕ | Θ ( n ) , Y ) ∫ ϕ′ P ( X i | ϕ ′ , Θ ( n ) , Y ) P ( ϕ ′ | Θ ( n ) , Y ) d ϕ ′ Just like in related ML methods, 6 rather than assigning an optimal orientation ϕ i ⁎ to each image, probability-weighted integrals over all possible orientations are calculated. Apart from that, Eq. (9) bears obvious resemblance to previously reported expressions of the Wiener filter for 3D reconstruction [see Eq. (4)]. This may not come as a surprise, since both derivations were based on the same image formation model and the same statistical assumptions about the signal and the noise. However, Eq. (9) was derived by straightforward optimization of the posterior distribution and does not involve any arbitrary decisions. As is typical for parameter estimation inside the expectation–maximization algorithm, both the power of the noise and the power of the signal are learned from the data in an iterative manner through Eqs. (10) and (11), respectively. The result is that Eq. (9) will yield an estimate of V that is both CTF corrected and low-pass filtered, and in which uneven distributions of the orientations of the experimental images are taken into account. As such, to my knowledge, this expression provides the first implementation of the intended meaning of the Wiener filter in the case of 3D reconstruction. The relative contribution of the two additive terms in the denominator of Eq. (9) also gives an objective indication of the SNR at any point in the 3D Fourier transform of the resulting reconstruction. Under the assumptions made above, for Fourier components where both terms are equal, the power of the noise in the reconstruction is expected to be as high as the power of the signal, that is, SNR = 1. Again, the statistical approach yields a result that is similar but not equivalent to that of existing approaches. The ratio of these two terms is most similar to the previously defined 3D spectral signal-to-noise ratio 23 but provides additional insights into how to take the CTFs into account. To avoid confusion with previously reported SSNR definitions, I will use the notation SNR l MAP, for SNR in the MAP estimate. Straightforward rewriting yields the following expression: (13) SNR l MAP = τ l 2 ∑ i = 1 N ∫ ϕ Γ i ϕ ( n ) ∑ j = 1 J P ϕ l j T CTF i j 2 σ i j 2 d ϕ The SNR l MAP yields a resolution estimate that varies in 3D Fourier space (i.e., with l), depending on the power of the signal, the power of the noise, the CTFs, and the orientational distribution of the 2D experimental images. However, often, a single value for the resolution of a given reconstruction is preferred. Therefore, the resolution-dependent spherical average of SNR l MAP may be useful. I will refer to this spherical average as the SSNRMAP and propose the highest resolution at which SSNRMAP > = 1 as an objective resolution criterium for a structure determined by MAP estimation. The iterative use of Eqs. (9–11) deserves further attention. The values of τ l 2(n) are calculated directly from the squared amplitudes of V l (n) and then used to calculate V l (n + 1) in the next iteration. For those l where SNR l MAP is large, V l (n + 1) will be calculated as a weighted sum over the 2D experimental images, much like the unregularized ML methods or the reconstruction in Eq. (2). For those l where SNR l MAP is small, the amplitudes of V l (n + 1) will be effectively dampened. If refinement is started from a strongly low-pass filtered reference structure, τ l 2(1) (and thus SNR l MAP) will only be large for the lowest frequency terms. Dampening of all higher-resolution terms will therefore result in relatively low-resolution estimates of V during the initial iterations. Nevertheless, the resolution of the reconstruction may gradually improve, provided that the SNR in the experimental images is high enough and enough iterations are performed. At some point in the iterative process, the resolution will stop improving because averaging over the noisy higher-resolution Fourier components no longer yields sufficiently high values of SNR l MAP. There remains one problem with the direct implementation of Eqs. (9–11). Their derivation depends on the assumption of independence between Fourier components of the signal. This assumption is known to be a poor one because the signal, a macromolecular complex, has a limited support in real space. Consequently, the power in the signal will be underestimated, and the reconstruction will be oversmoothed. Because the assumptions of independence are crucial in the derivation of a computationally tractable algorithm, heuristics seemed the only reasonable solution to this problem. Therefore, in the calculations presented below, all estimates for τ l 2 were multiplied by a constant, T = 4, in an attempt to account for the correlations between Fourier components in the signal. As expected, values of T close to 1 were observed to yield reconstructions with suboptimal resolutions, whereas for values larger than four, noticeable amounts of overfitting were observed (results not shown). One could argue that heuristics in existing approaches have been traded for a similar heuristics in the statistical approach. However, the heuristics proposed here are clearly argued as a consequence of limitations in the adopted statistical assumptions, whereas the reasons for heuristics in existing implementations are often arbitrary. In addition, whereas the heuristics in other approaches often involve multiple parameters, the heuristics employed here involve only a single constant whose optimal value is not expected to change much for different data sets. Results The MAP approach was tested in three different scenarios, each comprising a different cryo-EM data set. The first scenario represents an extreme case of reconstruction from images of suboptimal quality and illustrates the potential pitfalls of undetected overfitting. The second scenario comprises a data set of typical size and quality and illustrates the potential benefits of the statistical approach for data that could nowadays be collected in many cryo-EM laboratories. The third scenario illustrates the effectiveness of the statistical approach in dealing with structurally heterogeneous data sets, that is, when more than one different structures are present. Reduced overfitting of data with low SNRs The first test data set comprised 8403 archaeal thermosome particles. In a previous study, these data were judged to be of too low quality to allow reliable structure determination (Yebenes et al., unpublished data). Still, reference-free class averages showed 8-fold symmetric top views as well as asymmetric side views. Combination of these images led to an initial 3D map at 50 Å with C8 symmetry, and this symmetry was imposed during subsequent refinements. The initial map was first subjected to conventional refinement as implemented in the XMIPP package. 24 This implementation merely represents one of many other available implementations for cryo-EM reconstruction and is not expected to perform significantly better or worse than most of them. It comprises standard projection matching in polar coordinates, reconstruction by direct Fourier inversion, regularization by low-pass and Wiener filtering, and resolution estimation by Fourier shell correlation (FSC) between reconstructions of random halves of the data at every iteration. Based on the FSC = 0.5 criterion, the resulting reconstruction was estimated to have a resolution of 10 Å (Fig. 2a, broken green line), which might have been considered a reasonable result given that over 65,000 asymmetric units had been averaged. However, further analysis of the map revealed indications of severe overfitting, most notably a typical “hairy” aspect of the density, that is, with many high-resolution features superimposed on a low-resolution ghost of the initial model. In addition, the map lacked features one would expect at this resolution, for example, the presence of rod-shaped densities for α-helices (Fig. 2b, left). The presence of overfitting was confirmed by two completely independent refinements of random halves of the data that were started from the same initial model. Whereas these refinements yielded reconstructions with estimated resolutions of 11 and 12 Å, respectively, the two resulting maps correlated with each other only up to 30 Å (Fig. 2a, continuous green line). At this point, it should be noted that this degree of overfitting could probably have been avoided by careful low-pass filtering of the images prior to refinement and/or tuning of the parameters of the refinement protocol itself. However, such procedures were not performed in the MAP refinement described below, and they were deliberately omitted from the XMIPP refinement in order to illustrate the potential pitfalls of nonexpert use of conventional refinement strategies. Refinement of the same model by the MAP approach yielded a reconstruction for which the SSNRMAP dropped below 1 at 16 Å (Fig. 2a, broken red line) and which did not show strong indications of overfitting (Fig. 2b, right). In this case, independent refinements of two random halves of the data resulted in reconstructions that both had an estimated resolution of 16 Å and which also correlated with each other up to 16 Å (Fig. 2a, continuous red line). Further analysis revealed that a dip at 20–30 Å resolution in the FSC curve between the two independently refined maps could be related to the observation that a majority of the CTFs passed through zero close to 30 Å resolution. Probably, due to a scarcity of experimental data at this resolution, overfitting was not completely abolished in the statistical approach. Still, compared to the conventional approach, overfitting was significantly reduced, resulting in a better map and a more reliable resolution estimate. Increased objectivity in map interpretation The second test data set comprised 50,000 unliganded GroEL particles that were randomly selected from an original data set of 284,742 particles. 25 After sorting and analysis of 2D class averages, 39,922 particles were selected for 3D reconstruction using either MAP estimation or conventional refinement in XMIPP. Refinements were performed imposing D7 symmetry, and a starting model was obtained by applying a strict  50-Å low-pass filter to the 7.8-Å reconstruction that was reported for the original data set (Electron Microscopy Data Bank ID: 1200). MAP refinement yielded a reconstruction for which the SSNRMAP dropped below one at a resolution of 8.0 Å (Fig. 3a, broken red line). Conventional projection matching in XMIPP gave a reconstruction with an estimated resolution of 8.8 Å (Fig. 3a, broken green line). The calculation of FSC curves between these maps and a fitted 2.9-Å GroEL crystal structure (Protein Data Bank ID: 1XCK, see Experimental Procedures) confirmed that MAP refinement had reached a higher resolution than the conventional approach (Fig. 3a, continuous lines). The favorable comparison in resolution with the XMIPP reconstruction (and with the reconstruction that was reported for the originally much larger data set) indicates that regularization with a Gaussian prior does not result in oversmoothing of the reconstruction. On the contrary, through optimal filtering of the reconstruction during the refinement, higher resolutions may be obtained than with conventional approaches. Prior to visualization, the reconstructed density maps were sharpened using the approach proposed by Rosenthal and Henderson. 16 Through the use of the density map generated from the crystal structure as a reference, which itself has an estimated B-factor of 250 Å2, application of this procedure led to estimated B-factors of 560 Å2 for the XMIPP-generated reconstruction and 715 Å2 for the reconstruction from the MAP approach. Analysis of the corresponding Guinier plots (Fig. 3b) shows that the power of the XMIPP-generated map is too strong both at low resolution and at high resolution. This suboptimal weighting of different resolutions may be attributed to heuristics employed in the Wiener filter. XMIPP uses a constant for the SNR term in the Wiener filter and sets its value in the same way as FREALIGN does. 26 As also mentioned above, a single value is, however, inadequate to describe the intrinsic 3D behavior of the SNR in Fourier space. The statistical approach does employ a full 3D SNR model, and the Guinier plot of the reconstruction generated by MAP refinement is in excellent agreement with the model from its lowest frequency terms almost up to its estimated resolution. At the high-resolution end, despite FSC weighting, 16 the XMIPP-generated map still has relatively strong features beyond its estimated resolution. The FSC curve with the crystal structure (Fig. 3a, continuous green line) indicates that these features are mainly due to noise. These noise features result in an underestimation of the B-factor in the Rosenthal and Henderson approach. On the contrary, the signal in the map generated by the statistical approach drops sharply near its estimated resolution limit, which is a direct consequence of the low-pass filtering effects of Eq. (9). Therefore, whereas interpretation of the XMIPP-generated map at too high resolutions would be subject to errors, interpretation of the reconstruction from the statistical approach is unambiguous. Comparison of the sharpened reconstructions with an  8-Å low-pass filtered map that was generated from the crystal structure confirms the good quality of the MAP reconstruction and illustrates the problems of the conventional approach (Fig. 3c). Classification of minority conformations The third test data set comprised 10,000 Escherichia coli ribosome particles that were proposed as a benchmark for 3D classification algorithms. 27 Supervised classification had previously suggested that 5000 of these particles correspond to ratcheted ribosomes in complex with elongation factor G (EF-G) and a single tRNA molecule, while the other 5000 particles were interpreted as unratcheted ribosomes without EF-G and in complex with three tRNAs. Various classification algorithms have been tested using this data set, and all of them have reported results similar to the ones obtained using supervised classification. 28–31 However, simultaneous refinement of K = 4 reconstructions in the MAP approach (see also Experimental Procedures) identified a third previously unobserved class. Whereas, as expected, the first two maps of this refinement were interpreted as 70S ribosomes in complex with EF-G, and the third map as a 70S ribosome without EF-G, the fourth map corresponded to a 50S ribosomal subunit (Fig. 4a). A second calculation with randomly different initial models yielded similar results, with a 94% overlap in the 50S class. Although this class contains only a small minority of the particles (i.e., 6%), visual inspection of these particles and their reference-free 2D class averages confirmed the existence of 50S particles in the data (cf. Fig. 4b and c). Note that, as expected, the effective resolution as measured by the SSNRMAP is much lower for the minority class (30 Å) than for the other three classes (20–21 Å), which is a direct consequence of the lower number of particles contributing to the term on the left-hand side of the denominator of Eq. (9). The absence of such class-specific regularization is likely to lead to very noisy reconstructions for small classes in existing classification approaches, which may explain their failure in identifying the 50S class. As in related ML classification approaches, 6 the number of classes K is assumed to be known, that is, this number needs to be provided by the user, but this assumption is hardly ever met. Often, comparing calculations with different values of K provides a useful band-aid, but admittedly, there is no well-established, objective criterion to decide on its optimal value. In this case, refinements with K = 3 were not successful in revealing the 50S class, but refinements with K = 5 did give results similar to the ones in Fig. 4, albeit with an additional class corresponding to the 70S ribosome without EF-G (results not shown). Discussion and Conclusions Because the accumulation of noise in cryo-EM reconstructions is a consequence of the ill-posed character of the reconstruction problem, which in turn is caused by the high noise levels and the incompleteness of the experimental data, one could discern three general ways of improving cryo-EM reconstructions. Firstly, lower noise levels in the data will reduce ill-posedness and thus lead to better reconstructions. In this light, ongoing developments to improve microscopes (e.g., see Ref. 32) and detectors (e.g., see Ref. 33) are expected to make an important contribution to the field. Secondly, reducing incompleteness (due to unknown relative orientations) will also reduce ill-posedness and thus lead to better reconstructions. Obvious examples of less incomplete reconstruction problems are those where the molecules adopt some kind of internal symmetry, for example, helical assemblies or 2D crystals. It is therefore not surprising that, in particular for those systems, cryo-EM has been most successful in terms of resolution and map quality, 34 but also for reconstruction of asymmetric single particles, one might devise modifications of existing sample preparation protocols that somehow provide information on the orientations of the particles and thus reduce incompleteness (e.g., see Ref. 35) Thirdly, and this is the approach that has been taken in this paper, ill-posedness may be reduced by regularization, that is, the incorporation of prior information in the refinement. In this paper, the use of smoothness has been explored as a source of prior information about cryo-EM reconstructions. However, the observation that overfitting was not completely abolished in the thermosome example illustrates that smoothness alone might not be sufficient to fully determine a unique structure from very noisy cryo-EM data. One could envision the use of additional, more powerful sources of prior knowledge, such as non-negativity, solvent flatness, or ultimately the large amount of chemical knowledge that is available about proteins and nucleic acids. It might also be possible to identify alternative sources of prior knowledge from existing approaches that are aimed at reducing overfitting, such as that of Stewart et al. 36 The statistical framework described in this paper may be used to combine any source of prior information with the experimental data, provided that suitable numerical expressions may be formulated. In addition, it is foreseeable that the heuristics employed in this paper to prevent oversmoothing (multiplication of the estimates for τ l 2 with a constant) may be improved in the future. More detailed analyses of the correlations between Fourier components of macromolecules or the use of power spectra of previously determined structures may lead to better estimates for τ l 2. Meanwhile, reconstructions obtained by MAP refinement should report the value of T employed, and values much larger than 4 should probably be avoided. In general, the Bayesian view provides a rigorous theoretical framework for cryo-EM single-particle reconstruction, in which the explicit statistical assumptions can be criticized and, if possible, modified to provide better reconstructions. The procedures presented here render commonly employed heuristics in low-pass and Wiener filtering largely superfluous, as Bayes' law uniquely determines how observed experimental data should be combined with prior knowledge. As such, the Bayesian approach leaves little scope for arbitrary decisions by the user, which will alleviate the need for user expertise and ultimately contribute to increased objectivity in the reconstruction process. Experimental Procedures Cryo-electron microscopy Thermosome complexes from the hyperthermophylic archaeum Thermococcus strain KS-1 containing only α-subunits 37 were imaged under low-dose conditions in a FEI T20 microscope at 200 kV and a magnification of 50,000×. Micrographs were recorded on photographic film, scanned using a Zeiss SCAI scanner with a pixel size of 7 μm, and subsequently downsampled by a factor 2. Particles were picked manually and extracted in boxes of 120 × 120 pixels with a resulting pixel size of 2.8 Å. The GroEL data set used here is a random subset of the 284,742 particles described by Stagg et al. 25 In that study, data were collected in an automated manner using Leginon 38 on a FEI T20 microscope that was operated at 120 kV, and images were recorded on a 4k × 4k Gatan Ultrascan CCD at a magnification of 50,000×. Particles were selected automatically using template-based procedures and extracted in boxes of 128 × 128 pixels with a pixel size of 2.26 Å. The ribosome data set used here is a subset of the 91,114 particles described previously 21 and was downloaded from the Electron Microscopy Data Bank†. In this case, E. coli ribosomes in a pre-translational state were imaged under low-dose conditions on a FEI T20 electron microscope at 200 kV with a calibrated magnification of 49,650×. Particles were selected by preliminary automated particle picking, visual verification, and subsequent selection based on cross-correlation coefficient with a template. Particles were extracted in boxes of 130  × 130 pixels with a pixel size of 2.8 Å. Supervised classification had previously suggested that 5000 of the 10,000 particles used here correspond to unratcheted ribosomes without EF-G, and the other 5000 particles, to ratcheted ribosomes in complex with EF-G. Implementation The iterative algorithm in Eqs. (9–11) was implemented in a stand-alone computer program called RELION (REgularised LIkelihood OptimisatioN), which may be downloaded for free online‡. Although, for the sake of clarity, Eqs. (9–11) do not describe the case of simultaneous refinement of K different 3D models, derivation of the corresponding algorithm is straightforward. Moreover, the same theory may be used to derive the algorithm that simultaneously refines K 2D models. RELION implements both the 2D and the 3D cases of multi-reference refinement, and as such may be used for 3D classification of structurally heterogeneous data sets, as well as the calculation of 2D class averages. As was recognized by Sindelar and Grigorieff, 39 the power of the noise estimated from unmasked images is higher than that estimated from images that are masked to the area where the actual particle resides. Therefore, if unmasked images were used, this would lead to an overestimation of the noise by Eq. (10) and thus oversmoothing of the maps by Eq. (9). On the other hand, the use of masked images would lead to correlations between the assumedly independent Fourier components. All calculations presented in this paper were done using unmasked images, but an option to use masked images has also been implemented. Whereas Eq. (10) implies that the power of the noise is estimated as a 2D array for each experimental image, in practice, estimates for the power of the noise are obtained by averaging σ ij 2 over resolution rings and groups of images, for example, all images from a single micrograph. Also, estimates for the power of the signal, that is, τ l 2, are obtained by averaging over resolution shells. Note that, despite this averaging, SNR l MAP still varies in 3D depending on the orientational distribution of the images. Also, for the sake of simplicity, Eq. (9) does not reflect the corrections that are needed to account for interpolation operations and nonuniform sampling of the 3D Fourier transform. In practice, the 3D transform is oversampled three times, and projections as well as back-projections are performed by nearest-neighbor interpolation. An iterative gridding approach 40 is then used to deal with the nonuniform sampling of the oversampled 3D transform, prior to calculation of the inverse Fourier transform. Image processing All other image processing operations were performed in the XMIPP package. 18 Prior to refinement, all data sets were normalized using previously described procedures. 5 MAP refinements and projection matching refinements in XMIPP were performed with similar settings where possible. Although the implementation of the MAP approach readily handles anisotropic CTF models, all refinements were performed with isotropic CTFs (without envelope functions) for the sake of comparison with XMIPP. All orientational searches, or integrations in the statistical approach, were performed over the full five dimensions, that is, three Euler angles and two translations. For both the thermosome and the GroEL refinements, the first 10 iterations were performed with an angular sampling of 7.5°, and subsequent iterations were performed with an angular sampling interval of 3.75°. Thermosome refinements were stopped after 15 iterations, and GroEL refinements, after 20. Translational searches were limited to ± 10 pixels in both directions in the first 10 iterations and to ± 6 pixels in the subsequent iterations. Although it is common practice in XMIPP to reduce computational costs by breaking up the orientational search into separate rotational and translational searches and to limit rotational searches to local searches around previously determined orientations, this was not done in the refinements presented here for the sake of comparison with the MAP approach. Refinements with angular sampling intervals as fine as 1° where such tricks were employed did not result in better reconstructions (results not shown). The true resolution of the GroEL reconstructions was assessed by FSC with a published crystal structure (Protein Data Bank ID: 1XCK). This structure contains 14 unique monomers in its asymmetric unit. Each of these monomers was fitted separately into the reconstructions using UCSF Chimera, 41 and for each monomer, the equatorial, intermediate, and apical domains were allowed to move independently as rigid bodies. The resulting coordinates were converted to an electron density map that was symmetrized according to D7 symmetry. Optimization of the relative magnification between this map and the cryo-EM reconstructions revealed that the effective pixel size of the cryo-EM images was 2.19 Å, differing by 3% from the nominal value, and this value was used to generate all plots in Fig. 3. Ribosome refinements were performed for 25 iterations with an angular sampling of 7.5° and translational searches of ± 10 pixels. To generate K = 4 unsupervised initial starting models from a single  80-Å low-pass filtered initial ribosome structure, during the first iteration, we divided the data set into four random subsets in a way similar to that described before. 21
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A giant molecular proton pump: structure and mechanism of respiratory complex I.

              The mitochondrial respiratory chain, also known as the electron transport chain (ETC), is crucial to life, and energy production in the form of ATP is the main mitochondrial function. Three proton-translocating enzymes of the ETC, namely complexes I, III and IV, generate proton motive force, which in turn drives ATP synthase (complex V). The atomic structures and basic mechanisms of most respiratory complexes have previously been established, with the exception of complex I, the largest complex in the ETC. Recently, the crystal structure of the entire complex I was solved using a bacterial enzyme. The structure provided novel insights into the core architecture of the complex, the electron transfer and proton translocation pathways, as well as the mechanism that couples these two processes.
                Bookmark

                Author and article information

                Journal
                0410462
                6011
                Nature
                Nature
                Nature
                0028-0836
                1476-4687
                21 March 2018
                25 April 2018
                May 2018
                25 October 2018
                : 557
                : 7703
                : 123-126
                Affiliations
                [1 ]Department of Biochemistry, University of Illinois, 600 S. Mathews Street, Urbana, IL 61801, USA
                [2 ]NIH Center for Macromolecular Modeling and Bioinformatics, Beckman Institute for Advanced Science and Technology, University of Illinois, 405 N. Mathews, Urbana, IL 61801, USA
                [3 ]Molecular Medicine Program, The Hospital for Sick Children Research Institute, 686 Bay Street, Toronto, ON M5G 0A4, Canada
                [4 ]Center for Biophysics and Quantitative Biology, University of Illinois, 1110 Green Street, Urbana, IL 61801, USA
                [5 ]Department of Biochemistry, University of Mississippi Medical Center, 2500 N. State Street, Jackson, MS 39216, USA
                [6 ]Department of Medical Biophysics, The University of Toronto, Suite 15-701, 101 College Street Toronto, ON M5G 1L7, Canada
                [7 ]Department of Biochemistry, The University of Toronto, Room 5207, 1 King’s College Circle, Toronto, ON M5S 1A8, Canada
                Author notes
                [* ]Co-corresponding authors: J.L.R.: john.rubinstein@ 123456utoronto.ca , TEL: 416-813-7255, R.B.G.: r-gennis@ 123456illinois.edu , TEL: 217-333-9075, E.T.: emad@ 123456life.illinois.edu , TEL: 217-244-6941
                [a]

                Chang Sun, Samir Benlekbir and Padmaja Venkatakrishnan contributed equally to the work in this manuscript

                [b]

                Current address: Department of Microbiology and Molecular Genetics, UC Davis, One Shields Avenue, Davis, CA 95616

                Article
                NIHMS953209
                10.1038/s41586-018-0061-y
                6004266
                29695868
                04d1bea3-4638-4a18-8dfa-2dd526534283

                Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms

                Information for reprints and permissions is available at www.nature.com/reprints.

                History
                Categories
                Article

                Uncategorized
                Uncategorized

                Comments

                Comment on this article