127
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Improved maize reference genome with single-molecule technologies

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          An improved reference genome for maize, using single-molecule sequencing and high-resolution optical mapping, enables characterization of structural variation and repetitive regions, and identifies lineage expansions of transposable elements that are unique to maize.

          Supplementary information

          The online version of this article (doi:10.1038/nature22971) contains supplementary material, which is available to authorized users.

          A better map of the maize genome

          The maize genome was initially reported in 2009 but with some accuracy limitations. Doreen Ware and colleagues report a new reference genome for maize using single-molecule sequencing and high-resolution optical mapping. The technique shows improvements in the gene space including resolution of gaps and misassemblies and correction of order and orientation of genes. The authors characterize structural variation and repetitive regions, and identify transposable element lineage expansions unique to maize.

          Supplementary information

          The online version of this article (doi:10.1038/nature22971) contains supplementary material, which is available to authorized users.

          Abstract

          Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation 1 . These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions 2 . Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome 3 , our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing 4 . In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes.

          Supplementary information

          The online version of this article (doi:10.1038/nature22971) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references30

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons

          Background Transposable elements are abundant in eukaryotic genomes and it is believed that they have a significant impact on the evolution of gene and chromosome structure. While there are several completed eukaryotic genome projects, there are only few high quality genome wide annotations of transposable elements. Therefore, there is a considerable demand for computational identification of transposable elements. LTR retrotransposons, an important subclass of transposable elements, are well suited for computational identification, as they contain long terminal repeats (LTRs). Results We have developed a software tool LTRharvest for the de novo detection of full length LTR retrotransposons in large sequence sets. LTRharvest efficiently delivers high quality annotations based on known LTR transposon features like length, distance, and sequence motifs. A quality validation of LTRharvest against a gold standard annotation for Saccharomyces cerevisae and Drosophila melanogaster shows a sensitivity of up to 90% and 97% and specificity of 100% and 72%, respectively. This is comparable or slightly better than annotations for previous software tools. The main advantage of LTRharvest over previous tools is (a) its ability to efficiently handle large datasets from finished or unfinished genome projects, (b) its flexibility in incorporating known sequence features into the prediction, and (c) its availability as an open source software. Conclusion LTRharvest is an efficient software tool delivering high quality annotation of LTR retrotransposons. It can, for example, process the largest human chromosome in approx. 8 minutes on a Linux PC with 4 GB of memory. Its flexibility and small space and run-time requirements makes LTRharvest a very competitive candidate for future LTR retrotransposon annotation projects. Moreover, the structured design and implementation and the availability as open source provides an excellent base for incorporating novel concepts to further improve prediction of LTR retrotransposons.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss.

            Ancient tetraploidies are found throughout the eukaryotes. After duplication, one copy of each duplicate gene pair tends to be lost (fractionate). For all studied tetraploidies, the loss of duplicated genes, known as homeologs, homoeologs, ohnologs, or syntenic paralogs, is uneven between duplicate regions. In maize, a species that experienced a tetraploidy 5-12 million years ago, we show that in addition to uneven ancient gene loss, the two complete genomes contained within maize are differentiated by ongoing fractionation among diverse inbreds as well as by a pattern of overexpression of genes from the genome that has experienced less gene loss. These expression differences are consistent over a range of experiments quantifying RNA abundance in different tissues. We propose that the universal bias in gene loss between the genomes of this ancient tetraploid, and perhaps all tetraploids, is the result of selection against loss of the gene responsible for the majority of total expression for a duplicate gene pair. Although the tetraploidy of maize is ancient, biased gene loss and expression continue today and explain, at least in part, the remarkable genetic diversity found among modern maize cultivars.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Plant NBS-LRR proteins: adaptable guards

              Most of the disease resistance genes (R genes) in plants cloned to date encode nucleotide-binding site leucine-rich repeat (NBS-LRR) proteins characterized by nucleotide-binding site (NBS) and leucine-rich repeat (LRR) domains as well as variable amino- and carboxy-terminal domains (Figure 1). These large, abundant, proteins are involved in the detection of diverse pathogens, including bacteria, viruses, fungi, nematodes, insects and oomycetes. There have been numerous extensive reviews since the first NBS-LRR-encoding genes were cloned from plants in 1994 (for example [1-5]). This article aims to provide a current overview of the structure and function of this protein family as well as to highlight recent advances. Plant NBS-LRR proteins are similar in sequence to members of the mammalian nucleotide-binding oligomerization domain (NOD)-LRR protein family (also called 'CARD, transcription enhancer, R (purine)-binding, pyrin, lots of leucine repeats' (CATERPILLER) proteins), which function in inflammatory and immune responses [6]. But although mammalian NOD-LRR proteins have the same tripartite domain organization as plant NBS-LRR proteins, including a nucleotide-binding domain and a LRR domain, the functional similarities between NBS-LRR and mammalian NOD proteins are probably the result of convergent evolution [7]. There are no NOD-related proteins in Caenorhabditis elegans or Drosophila melanogaster and the downstream partners of the two families differ [7,8]. The human NOD protein apoptotic protease activating factor 1 (APAF-1) has an NBS domain with greater protein-sequence similarity to plant NBS-LRR proteins than to other mammalian NOD proteins; however, it shares neither the amino-terminal nor the carboxy-terminal LRR domains characteristic of plant NBS-LRR proteins. Evolution and genome organization Plant NBS-LRR proteins are numerous and ancient in origin. They are encoded by one of the largest gene families known in plants. There are approximately 150 NBS-LRR-encoding genes in Arabidopsis thaliana, over 400 in Oryza sativa [3,9,10], and probably considerably more in larger plant genomes that have yet to be fully sequenced. Many NBS-encoding sequences have now been amplified from a diverse array of plant species using PCR with degenerate primers based on conserved sequences within the NBS domain and there are currently over 1,600 NBS sequences in public databases (Additional data file 1). They are found in non-vascular plants and gymnosperms as well as in angiosperms; orthologous relationships are difficult to determine, however, owing to lineage-specific gene duplications and losses [11,12]. In several lineages, NBS-LRR-encoding genes have become amplified, resulting in family-specific subfamilies (Figure 2; Additional data file 1) [13]. Of the 150 NBS-LRR sequences in Arabidopsis, 62 have NBS regions more similar to each other than to any other non-Brassica sequences (Figure 2; Additional data file 2). Different subfamilies have been amplified in the legumes (which includes beans), the Solanaceae (which includes tomato and potato), and the Asteraceae (which includes sunflower and lettuce) [13-15]. The spectrum of NBS-LRR proteins present in one species is not therefore characteristic of the diversity of NBS-LRR proteins in other plant families. NBS-LRR-encoding genes are frequently clustered in the genome, the result of both segmental and tandem duplications [3,10,16,17]. There can be wide intraspecific variation in copy number because of unequal crossing-over within clusters [18,19]. NBS-LRR-encoding genes have high levels of inter- and intraspecific variation but not high rates of mutation or recombination [19]. Variation is generated by normal genetic mechanisms, including unequal crossing-over, sequence exchange, and gene conversion, rather than genetic events particular to NBS-LRR-encoding genes [3,19-21]. The rate of evolution of NBS-LRR-encoding genes can be rapid or slow, even within an individual cluster of similar sequences. For example, the major cluster of NBS-LRR-encoding genes in lettuce includes genes with two patterns of evolution [19]: type I genes evolve rapidly with frequent gene conversions between them, whereas type II genes evolve slowly with rare gene conversion events between clades. This heterogeneous rate of evolution is consistent with a birth-and-death model of R gene evolution, in which gene duplication and unequal crossing-over can be followed by density-dependent purifying selection acting on the haplotype, resulting in varying numbers of semi-independently evolving groups of R genes [19,22]. The impact of selection on the different domains of individual NBS-LRR-encoding genes is also heterogeneous [19]. The NBS domain seems to be subject to purifying selection but not to frequent gene-conversion events, whereas the LRR region tends to be highly variable. Diversifying selection, as indicated by significantly elevated ratios of non-synonymous to synonymous nucleotide substitutions, has maintained variation in the solvent-exposed residues of the β-sheets of the LRR domain (see below) [19,23]. Unequal crossing-over and gene conversion have generated variation in the number and position of LRRs, and in-frame insertions and/or deletions in the regions between the β-sheets have probably changed the orientation of individual β-sheets. There are, on average, 14 LRRs per protein and often 5 to 10 sequence variants for each repeat; therefore, even within Arabidopsis, there is the potential for well over 9 × 1011 variants, which emphasizes the highly variable nature of the putative binding surface of these proteins. There are two major subfamilies of plant NBS-LRR proteins, defined by the presence of Toll/interleukin-1 receptor (TIR) or coiled-coil (CC) motifs in the amino-terminal domain (Figure 1). Although TIR-NBS-LRR proteins (TNLs) and CC-NBS-LRR proteins (CNLs) are both involved in pathogen recognition, the two subfamilies are distinct both in sequence and in signaling pathways (see below) and cluster separately in phylogenetic analyses using their NBS domains (see Additional data file 2) [24,25]. TNLs are completely absent from cereal species, which suggests that the early angiosperm ancestors had few TNLs and that these were lost in the cereal lineage. The presence or absence of TNLs in basal monocots is not currently known. CNLs from monocots and dicots cluster together, indicating that angiosperm ancestors had multiple CNLs (Figure 2) [26]. There are also 58 proteins in Arabidopsis that are related to the TNL or CNL subfamilies but lack the full complement of domains [3,27]. These include 21 TIR-NBS (TN) and five CC-NBS (CN) proteins that have amino-terminal and NBS domains but lack a LRR domain [27]. The function of these proteins is not known, but they have the potential to act as adaptors or regulators of TNL and CNL proteins. Characteristic structural features NBS-LRR proteins are some of the largest proteins known in plants, ranging from about 860 to about 1,900 amino acids. They have at least four distinct domains joined by linker regions: a variable amino-terminal domain, the NBS domain, the LRR region, and variable carboxy-terminal domains (Figure 1). Four subfamilies of CNLs and eight subfamilies of TNLs were identified in Arabidopsis from sequence homology, motifs, intron positions and intron phase [3]. No crystal structures have been determined for any part of a plant NBS-LRR protein; crystal structures of mammalian NBS and LRR domains are, however, available as templates for homology-modeling approaches. The amino-terminal domain There is little experimental information on the function of the amino-terminal domain. In animals, the TIR domain is involved in signaling downstream of Toll-like receptors. Many plant NBS-LRR proteins are thought to monitor the status of ('guard') targets of pathogen virulence effectors (see below). Given the presence of TIR or CC motifs as well as the diversity of these domains, the amino termini are thought to be involved in protein-protein interactions, possibly with the proteins being guarded or with downstream signaling components [4]. Polymorphism in the TIR domain of the flax TNL protein L6 affects the specificity of pathogen recognition [28]. An alanine-polyserine motif that may be involved in protein stability is located immediately adjacent to the amino-terminal methionine in many TNLs (but not CNLs) in Arabidopsis [3]. Four conserved TIR motifs span 175 amino acids within the TIR domain of TNLs [27]. A CC motif is common but not always present in the 175 amino acids amino-terminal to the NBS of CNLs [3]. Some CNLs have large amino-terminal domains; tomato Prf, for example, has 1,117 amino acids amino-terminal of the NBS, much of which is unique to this protein. The NBS domain More is known of the structure and function of the NBS domain, which is also called the NB-ARC (nucleotide binding adaptor shared by NOD-LRR proteins, APAF-1, R proteins and CED4) domain. This domain contains several defined motifs characteristic of the 'signal transduction ATPases with numerous domains' (STAND) family of ATPases, which includes the mammalian NOD proteins [29,30]. STAND proteins function as molecular switches in disease signaling pathways. Specific binding and hydrolysis of ATP has been shown for the NBS domains of two tomato CNLs, I2 and Mi [31]. ATP hydrolysis is thought to result in conformational changes that regulate downstream signaling. The first report of NBS-LRR protein oligomerization, a critical event in signaling from mammalian NOD proteins, is the oligomerization of tobacco N protein (a TNL) in response to pathogen elicitors [32]. In Arabidopsis, eight conserved NBS motifs have been identified through analysis with MEME, a program for motif identification [3]. NBS domains of TNLs and CNLs are distinguished by the sequences of three resistance NBS (RNBS) motifs within them (RNBS-A, RNBS-C, and RNBS-D motifs; see Additional data file 3) [3]. Threading plant NBS domains onto the crystal structure of human APAF-1 provides informative insights into the spatial arrangement and function of the motifs conserved in the plant NBS domains (Figure 3) [30,33]. The nucleotide-binding domain of APAF-1 consists of three subdomains: a three-layered α/β subdomain (containing the anchor region), a helical subdomain (containing the kinase-2 motif and P-loop) and a winged-helix subdomain (containing the MHDV motif; Figure 3). The specific binding of ADP by human APAF-1 is achieved by a total of eight direct and four water-mediated hydrogen bonds; the P-loop portion of the helical subdomain interacts with the α- and β-phosphates of ADP, a histidine and a serine residue on the winged-helix subdomain interacts with a phosphate and the sugar of ADP, and a small anchor region in the α/β subdomain stabilizes the adenine base [33]. The binding pocket and patterns of binding to ADP are well conserved in the threading models of TNLs (exemplified by the Arabidopsis protein RPS4) and CNLs (exemplified by the Arabidopsis protein RPS5; Figure 3) ([30] and P.K., unpublished work). The NBS domains of TNLs contain additional loops absent in the NBS domain of CNLs. TNLs and CNLs have four conserved motifs that are located around the catalytic cleft: the P-loop, the anchor region, and the MHDV motif (specifically the histidine residue), all of which serve to orient the ADP molecule, as well as the GLPL motif (the MHDV and GLPL motifs are named after their constituent amino acids in the single-letter code). While there is no obvious contact between ADP and the GLPL motif in human APAF-1, the conservation of its position on top of the binding site in APAF-1, RPS4 and RPS5 indicates that it may be involved in binding ADP. In addition, the last two aspartic acids in the kinase-2 motif are positioned to interact with the third phosphate of ATP, consistent with their role of coordination for the divalent metal ion required for phosphotransfer reactions, for example the Mg2+ of Mg-ATP (Figure 3). The anchor region in the α/β subdomain of APAF-1, which consists of the sequence Val-Thr-Arg, is present as Phe-Gly-Asn in RSP4 and as Val-Gly-Gln in RPS5. This anchor region, consisting of a hydrophobic (Val or Phe), a small (Gly or Thr) and a polar (Arg, Asn or Gln) amino acid, was previously unrecognized, but is highly conserved in plant NBS-LRR proteins (see Additional data file 3). Autoactivating mutations in two CNLs, potato Rx (Asp460Val) and tomato I2 (Asp495Val), map next to the histidine in the MHDV motif; these mutations may perturb the binding of the β-phosphate of ADP and result in a more open structure [30]. The LRR domain The LRR domain is a common motif found in more than 2,000 proteins, from viruses to eukaryotes, and it is involved in protein-protein interactions and ligand binding [1]. The crystal structures of more than 20 LRR proteins have revealed that LRR domains characteristically contain a series of β-sheets that form the concave face shaped like a horseshoe or banana [34]. Less is known, however, about the quaternary arrangements of LRR proteins. At least three different types of dimers have been observed, involving interactions of either their concave surfaces [35] or their convex surfaces [36,37], or by concatenation involving an antiparallel β-sheet at the interface [38]. Threading of the LRR domain of Arabidopsis RPS5 onto the crystal structure of the bovine decorin protein, a member of the small LRR proteoglycans (SLRP) protein family with a protein core composed of LRRs [35], provided a model consistent with a curved horseshoe-like surface of β-sheets (Figure 4; P.K., unpublished work). The number of repeats in the LRR domains in TNLs and CNLs of Arabidopsis is similar (mean 14, range 8 to 25), but this number can be considerably higher in other species. In the lettuce CNL Resistance Gene Candidate 2 (RGC2) proteins, an example of which is Dm3, the LRR domain appears to be duplicated and there can be as many as 47 LRRs in total [19]. Each LRR comprises a core of about 26 amino acids containing the Leu-xx-Leu-xx-Leu-x-Leu-xx-Cys/Asn-xx motif (where x is any amino acid), which forms a β-sheet; each core region is separated by a section of variable length that varies from zero to 30 amino acids. In many NBS-LRR proteins, the putative solvent-exposed residues (shown as x in the consensus sequence above) show significantly elevated ratios of nonsynonymous to synonymous substitutions, indicating that diversifying selection has maintained variation at these positions. The LRR domain is involved in determining the recognition specificity of several R proteins (for example [18,39-42]); direct interaction with pathogen proteins has rarely been shown, however. The LRR domain may be involved predominantly in regulatory intramolecular interactions. The LRR domain of the potato CNL Rx interacts with the NBS domain even when expressed in trans; this interaction is disrupted by the potato virus X elicitor, a viral coat protein that can induce a host defense response [43]. Also, the inner, concave surface of the β-sheets may not be the only binding surface. The LRR domain of TLR3, a human Toll-like receptor, is predicted to form a heterodimer and to bind double-stranded RNA from pathogens against its looped surface, on the opposite side from the β-sheets [37]. Analysis using MEME identified few motifs in common between the LRR domains of TNLs and CNLs in Arabidopsis [3]. The third LRR was one of the few that contained a conserved motif. Mutation in this LRR of the CNL RPS5 results in epistatic inhibitory effects on multiple NBS-LRR proteins, suggesting that the LRR may interact with downstream signaling components [5,44]; also, a mutation within this LRR in the CNL Rx of potato results in a constitutively active form [45]. The carboxyl termini CNLs and TNLs differ markedly in the size and composition of their carboxy-terminal domains. Those of TNLs are larger and more variable than those of CNLs. CNLs typically have only 40-80 amino acids carboxy-terminal to the LRR domain, whereas the carboxyl termini of TNLs often have an additional 200-300 amino acids, equaling the size of the LRR domain. Several TNLs have extensions with similarity to other proteins [3]. One of the larger TNLs in Arabidopsis, RRS1, which becomes localized to the nucleus in response to infection, encodes a 1,388 amino-acid protein with a nuclear localization signal and a WRKY motif (a motif also found in zinc-finger transcription factors and containing the sequence Trp-Arg-Lys-Tyr) at the carboxyl terminus [46]. Function, localization and regulation Disease resistance is the only function so far demonstrated for NBS-LRR proteins; however, a role in resistance has yet to be confirmed for most. Functions in other areas of plant biology cannot be excluded, particularly for the more divergent members of the family. The simplest model for NBS-LRR R protein function is as receptors that bind effector molecules secreted by pathogens, but direct interactions between NBS-LRR R proteins and effector proteins have been detected only rarely [47,48]. In an alternative model, the 'guard hypothesis', NBS-LRR R proteins monitor the status of plant proteins targeted by pathogen effectors [49,50]. Such indirect detection of pathogens allows a limited number of NBS-LRR R proteins to detect the activity of multiple pathogen effectors that target points of vulnerability in the plant. This has been best characterized in Arabidopsis: the CNL protein RPM1 detects the phosphorylation of RPM1-Interacting Protein 4 (RIN4) by the pathogen effectors AvrB and AvrRpm1 from Pseudomonas syringae pv. glycinea and pv. maculicola, respectively, and elicits the resistance response (Figure 5) [51]. The elicitation of this response can be abrogated by a third effector, AvrRpt2 from P. syringae pv. tomato, a protease that cleaves RIN4 [52,53]. The disappearance of RIN4 is detected, however, by a second CNL, RPS2, that in turn elicits the defense response [54,55]. There is increasing evidence from several systems that other R proteins similarly act as guards of host targets rather than direct receptors, at least for bacterial effectors [56-58]. NBS-LRR proteins function as components of macromolecular complexes [59]. Yeast two-hybrid and, more recently, co-immunoprecipitation experiments have identified multiple interacting proteins. All of the constituents and details of the dynamics of these complexes have yet to be determined, however. Oligomerization of animal NOD proteins through the NBS domain or oligomerization of Toll-like receptors through the TIR domain is important for activating the signaling pathway in animal innate immune systems [60-64], but there are currently few data on the oligomerization of plant NBS-LRR proteins. Effector-induced self-oligomerization of the tobacco N protein (a TNL) has recently been demonstrated in Nicotiana benthamiana; the ability to oligomerize was retained after loss-of-function mutations in the RNBS-A motif and TIR domain, but lost after P-loop mutations [32]. Little is known about the regulation of the plant genes that encode NBS-LRRs. Consistent with the need for a rapid response to pathogen attack, many NBS-LRR-encoding genes are constitutively expressed at low levels in healthy, unchallenged tissue, although some show tissue-specific expression (X.T., unpublished work). They are upregulated, however, in response to bacterial flagellin, which induces basal resistance, suggesting that plants can establish a state of heightened sensitivity to pathogen attack [65,66]. Both TNLs and CNLs include members that undergo alternative splicing. Alternative splicing of Toll-like receptors in animals is common and splice variants of the mouse Toll-like receptor TLR4 may be part of a regulatory feedback loop inhibiting excessive responses to bacterial lipopolysaccharide [67,68]. The induction of splice variants upon pathogen recognition has been observed for plant NBS-LRR proteins, suggesting that alternative splicing may have a regulatory role in the plant defense response [68]. Multiple transcripts have been detected for several TNL-encoding genes (RPP5, RPS4, and RAC1 in Arabidopsis, L6 in flax, N in tobacco, Y-1 in potato, and Bs4 in tomato) and fewer CNL-encoding genes [69-76], although their significance to disease resistance is unclear. The ratio of transcripts from the tobacco N gene is critical for resistance to tobacco mosaic virus [71]. Both full-length and alternative transcripts are necessary for resistance mediated by RPS4 in Arabidopsis [73]. Triggering of basal resistance and/or cell death associated with specific resistance imposes a heavy cost and is therefore likely to be tightly regulated. There is growing evidence for multiple layers of negative regulation, paralleling that observed in mammals. One layer involves RIN4; the disappearance of RIN4 triggers the basal resistance response (see above) [4,51]. Another level involves the interaction between the LRR and NBS regions; the LRR can act in trans as a negative regulator of the NBS in the CNLs potato Rx and tomato Mi [42,43]). A third layer involves the conformational change of the NBS following hydrolysis of ATP [31]. NBS-LRR R protein activity may also be subject to regulation by heat-shock proteins such as the Hsp90 proteins [4]; both CNLs such as Arabidopsis RPM1 and potato Rx and TNLs such as the tobacco N protein require cytosolic HSP90 for their function [77-79]. The role of protein degradation in resistance signaling is unclear, but there is increasing evidence for its importance [80]. Two proteins, 'Required for Mla12 Resistance 1' (RAR1) and 'Suppressor of G2 Allele of SKP1' (SGT1), are required for the function of several R proteins that signal through different pathways [59]. The COP9 signalosome, a multiprotein complex involved in protein degradation, is required for resistance to tobacco mosaic virus mediated by the tobacco TNL N protein [81]. The Arabidopsis CNL protein RPM1 is degraded at the onset of the hypersensitive response [82]; RING-finger E3 ubiquitin ligases in Arabidopsis are involved in RPM1- and RSP2-mediated elicitation of the hypersensitive response [83]. Therefore, either specific or general proteolysis may have roles in controlling the amplitude of the defense response and the extent of cell death associated with the hypersensitive response. Most NBS-LRR proteins lack a signal peptide or membrane-spanning regions and are therefore assumed to be cytoplasmic. Fractionation studies and interactions in yeast with membrane-associated proteins suggest that several are localized to the inner side of the membrane [51,54,55,82]. Localization studies are challenging, however, because of the probable dynamic nature of complexes and because of the low endogenous expression levels of NBS-LRR proteins; consequently, data from overexpression studies are difficult to interpret. Plant NBS-LRR proteins act through a network of signaling pathways and induce a series of plant defense responses, such as activation of an oxidative burst, calcium and ion fluxes, mitogen-associated protein kinase cascade, induction of pathogenesis-related genes, and the hypersensitive response [4,84-86]. At least three independent, genetically defined signaling pathways in Arabidopsis are induced by NBS-LRR proteins [87]. TNLs and CNLs tend to signal through different downstream pathways: TNLs signal through the 'Enhanced Disease Susceptibility' protein EDS1 and CNLs through the 'Non-race specific Disease Resistance' protein NDR1, although this correlation is not absolute. A separate pathway independent of EDS1 and NDR1 is activated by the Arabidopsis CNLs RPP8 and RPP13. Several small signaling molecules in the plant defense response, such as salicylic acid, jasmonic acid, ethylene, and nitric oxide, are involved downstream of NBS-LRR proteins and there is complicated cross-talk between the different signaling pathways, involving both synergism and mutual antagonism between pathways [88-91]. Frontiers The scope and complexity of this protein family provide many opportunities and challenges for both evolutionary and functional studies. An important immediate goal is to obtain crystal structures of NBS-LRR proteins, either in their entirety or as individual domains with and without their ligands. The coevolution of NBS-LRR proteins with their cognate bacterial effectors and their plant targets is of considerable interest, particularly as understanding these genetic changes and selective forces could lead to strategies for generating plants with more durable disease resistance. We also need to address an intriguing conundrum: if the LRR domain is acting as a negative regulator of the NBS domain and NBS-LRR proteins are monitoring the status of conserved host proteins, why is there frequently a strong evolutionary signal of divergent selection acting on solvent-exposed residues on the concave surface of the LRR? Numerous questions remain at the functional level. Are all NBS-LRR proteins involved in plant defense, or do some have other functions? What are the constituents of the macromolecular complexes involving NBS-LRR proteins and what events occur upon pathogen challenge? Do these complexes often contain multiple NBS-LRR proteins [92]? Are pathogen effectors usually detected indirectly, through monitoring their activity on plant targets, or are some effectors, for example from oomycetes or fungi, detected directly by NBS-LRR proteins? Do the proteins with only some of the domains, such as the TN and CN proteins [27], function as regulatory or adaptor molecules? Other questions include the functions of the variable amino- and carboxy-terminal domains and the multiple layers of positive and negative regulation (transcriptional, alternative splicing, phosphorylation and particularly protein degradation). Also, what is the functional significance of the lack of TNLs in cereals, and does this result in a different spectrum of resistance responses? Finally, what is the molecular basis of 'restricted taxonomic functionality' (resistance function restricted to within a plant family) of NBS-LRR proteins [93] and which additional proteins are required for function in plants other than the source species? Ultimately, once the evolutionary mechanisms and structure-function relationships are understood in detail, it might be possible to generate NBS-LRR proteins with new recognition specificities that target key pathogen constituents, resulting in new, durable forms of resistance. Additional data files The following additional data files are available: Additional data file 1 shows an alignment of 65 amino acids from 1,600 NBS sequences used to generate the neighbor-joining trees shown in Figure 2 and Additional data file 2; in both additional data files, parts (a) show TNL sequences and parts (b) CNL sequences. Additional data file 3 shows an alignment of NBS sequences used to generate the models of the NBS domain of RPS4 and RPS5 shown in Figure 3; PHYRE, a threading service available at [94], identified APAF-1 (PDB code 1z6t) as a reliable template to model the RPS4 and RPS5 NBS domains, with Z-scores of 5 × 10-23 and 1 × 10-18, respectively. The PHYRE pairwise sequence alignments of APAF-1 and RPS4 and of APAF-1 and RPS5 were collated into a single alignment without further refinement. Boxes show the positions of the eight motifs identified by Meyers et al. [3] and the position of the anchor region. Supplementary Material Additional data file 1 An alignment of 65 amino acids from 1,600 NBS sequences used to generate the neighbor-joining trees shown in Figure 2 and Additional data file 2 Click here for file Additional data file 2 A more detailed version of Figure 2 Click here for file Additional data file 3 An alignment of NBS sequences used to generate the models of the NBS domain of RPS4 and RPS5 shown in Figure 3 Click here for file
                Bookmark

                Author and article information

                Contributors
                ware@cshl.edu
                Journal
                Nature
                Nature
                Nature
                Nature Publishing Group UK (London )
                0028-0836
                1476-4687
                12 June 2017
                12 June 2017
                2017
                : 546
                : 7659
                : 524-527
                Affiliations
                [1 ]GRID grid.225279.9, ISNI 0000 0004 0387 3667, Cold Spring Harbor Laboratory, Cold Spring Harbor, ; New York, 11724 USA
                [2 ]GRID grid.423340.2, Pacific Biosciences, ; Menlo Park, 94025 California USA
                [3 ]GRID grid.470262.5, ISNI 0000 0004 0473 1353, BioNano Genomics, ; San Diego, 92121 California USA
                [4 ]GRID grid.27860.3b, ISNI 0000 0004 1936 9684, Department of Plant Sciences and Center for Population Biology, , University of California, Davis, ; Davis, 95616 California USA
                [5 ]GRID grid.463419.d, ISNI 0000 0004 0404 0958, USDA-ARS, Plant Genetics Research Unit, ; Columbia, 65211 Missouri USA
                [6 ]GRID grid.213876.9, ISNI 0000 0004 1936 738X, University of Georgia, ; Athens, 30602 Georgia USA
                [7 ]GRID grid.410445.0, ISNI 0000 0001 2188 0957, Department of Molecular Biosciences and Bioengineering, , University of Hawaii, ; Honolulu, 96822 Hawaii USA
                [8 ]GRID grid.27860.3b, ISNI 0000 0004 1936 9684, Department of Evolution and Ecology, , University of California, ; Davis, 95616 California USA
                [9 ]GRID grid.17635.36, ISNI 0000000419368657, Department of Plant Biology, , University of Minnesota, ; St Paul, 55108 Minnesota USA
                [10 ]GRID grid.27860.3b, ISNI 0000 0004 1936 9684, Department of Plant Sciences, , Center for Population Biology, and Genome Center, University of California, ; Davis, 95616 California USA
                [11 ]GRID grid.5386.8, ISNI 000000041936877X, USDA-ARS, NEA Robert W. Holley Center for Agriculture and Health, Cornell University, ; Ithaca, 14853 New York USA
                Article
                BFnature22971
                10.1038/nature22971
                7052699
                28605751
                e6b0542a-19f9-4e05-b5dc-b7d3431a73f6
                © The Author(s) 2017

                This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons licence, users will need to obtain permission from the licence holder to reproduce the material. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/

                History
                : 27 September 2016
                : 14 May 2017
                Categories
                Article
                Custom metadata
                © The Author(s), under exclusive licence to Springer Nature Limited 2017

                Uncategorized
                plant sciences,genetics,genome informatics
                Uncategorized
                plant sciences, genetics, genome informatics

                Comments

                Comment on this article