Rapid turnover of effectors in grass powdery mildew ( Blumeria graminis )

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Grass powdery mildew ( Blumeria graminis, Ascomycota) is a major pathogen of cereal crops and has become a model organism for obligate biotrophic fungal pathogens of plants. The sequenced genomes of two formae speciales ( ff.spp.), B.g. hordei and B.g. tritici (pathogens of barley and wheat), were found to be enriched in candidate effector genes (CEGs). Similar to other filamentous pathogens, CEGs in B. graminis are under positive selection. Additionally, effectors are more likely to have presence-absence polymorphisms than other genes among different strains.

Results

Here we identified effectors in the genomes of three additional host-specific lineages of B. graminis ( B.g. poae, B.g. avenae and B.g. infecting Lolium) which diverged between 24 and 5 million years ago (Mya). We found that most CEGs in B. graminis are clustered in families and that most families are present in both reference genomes ( B.g. hordei and B.g. tritici) and in the genomes of all three newly annotated lineages. We identified conserved protein domains including a novel lipid binding domain. The phylogenetic analysis showed that frequent gene duplications and losses shaped the diversity of the effector repertoires of the different lineages through their evolutionary history. We observed several lineage-specific expansions where large clades of CEGs originated in only one lineage from a single gene through repeated gene duplications. When we applied a birth-death model we found that the turnover rate (the rate at which genes are deleted and duplicated) of CEG families is much higher than for non-CEG families. The analysis of genomic context revealed that the immediate surroundings of CEGs are enriched in transposable elements (TE) which could play a role in the duplication and deletion of CEGs.

Conclusions

The CEG repertoires of related pathogens diverged dramatically in short evolutionary times because of rapid turnover and of positive selection fixing non-synonymous mutations. While signatures of positive selection on effector sequences are the expected outcome of the evolutionary “arms race” between pathogen and plant immune system, it is more difficult to infer the mechanisms and evolutionary forces that maintained an extreme turnover rate in CEG families of B. graminis for several millions of years.

Electronic supplementary material

The online version of this article (10.1186/s12862-017-1064-2) contains supplementary material, which is available to authorized users.

Related collections

Most cited references 61

Record: found
Abstract: found
Article: found

Is Open Access

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies

Alexandros Stamatakis (2014)

Motivation: Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees. In addition, an up-to-date 50-page user manual covering all new RAxML options is available. Availability and implementation: The code is available under GNU GPL at https://github.com/stamatak/standard-RAxML. Contact: alexandros.stamatakis@h-its.org Supplementary information: Supplementary data are available at Bioinformatics online.

0 comments Cited 7198 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

MUSCLE: multiple sequence alignment with high accuracy and high throughput.

R. C. Edgar (2004)

We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.

0 comments Cited 5987 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Integrative Genomics Viewer

James Robinson, Helga Thorvaldsdóttir, Wendy Winckler … (2011)

To the Editor Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole genome sequencing, epigenetic surveys, expression profiling of coding and non-coding RNAs, SNP and copy number profiling, and functional assays. Analysis of these large, diverse datasets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large datasets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data poses a significant challenge to the development of such tools. To address this challenge we developed the Integrative Genomics Viewer (IGV), a lightweight visualization tool that enables intuitive real-time exploration of diverse, large-scale genomic datasets on standard desktop computers. It supports flexible integration of a wide range of genomic data types including aligned sequence reads, mutations, copy number, RNAi screens, gene expression, methylation, and genomic annotations (Figure S1). The IGV makes use of efficient, multi-resolution file formats to enable real-time exploration of arbitrarily large datasets over all resolution scales, while consuming minimal resources on the client computer (see Supplementary Text). Navigation through a dataset is similar to Google Maps, allowing the user to zoom and pan seamlessly across the genome at any level of detail from whole-genome to base pair (Figure S2). Datasets can be loaded from local or remote sources, including cloud-based resources, enabling investigators to view their own genomic datasets alongside publicly available data from, for example, The Cancer Genome Atlas (TCGA) 1 , 1000 Genomes (www.1000genomes.org/), and ENCODE 2 (www.genome.gov/10005107) projects. In addition, IGV allows collaborators to load and share data locally or remotely over the Web. IGV supports concurrent visualization of diverse data types across hundreds, and up to thousands of samples, and correlation of these integrated datasets with clinical and phenotypic variables. A researcher can define arbitrary sample annotations and associate them with data tracks using a simple tab-delimited file format (see Supplementary Text). These might include, for example, sample identifier (used to link different types of data for the same patient or tissue sample), phenotype, outcome, cluster membership, or any other clinical or experimental label. Annotations are displayed as a heatmap but more importantly are used for grouping, sorting, filtering, and overlaying diverse data types to yield a comprehensive picture of the integrated dataset. This is illustrated in Figure 1, a view of copy number, expression, mutation, and clinical data from 202 glioblastoma samples from the TCGA project in a 3 kb region around the EGFR locus 1, 3 . The investigator first grouped samples by tumor subtype, then by data type (copy number and expression), and finally sorted them by median copy number over the EGFR locus. A shared sample identifier links the copy number and expression tracks, maintaining their relative sort order within the subtypes. Mutation data is overlaid on corresponding copy number and expression tracks, based on shared participant identifier annotations. Several trends in the data stand out, such as a strong correlation between copy number and expression and an overrepresentation of EGFR amplified samples in the Classical subtype. IGV’s scalable architecture makes it well suited for genome-wide exploration of next-generation sequencing (NGS) datasets, including both basic aligned read data as well as derived results, such as read coverage. NGS datasets can approach terabytes in size, so careful management of data is necessary to conserve compute resources and to prevent information overload. IGV varies the displayed level of detail according to resolution scale. At very wide views, such as the whole genome, IGV represents NGS data by a simple coverage plot. Coverage data is often useful for assessing overall quality and diagnosing technical issues in sequencing runs (Figure S3), as well as analysis of ChIP-Seq 4 and RNA-Seq 5 experiments (Figures S4 and S5). As the user zooms below the ~50 kb range, individual aligned reads become visible (Figure 2) and putative SNPs are highlighted as allele counts in the coverage plot. Alignment details for each read are available in popup windows (Figures S6 and S7). Zooming further, individual base mismatches become visible, highlighted by color and intensity according to base call and quality. At this level, the investigator may sort reads by base, quality, strand, sample and other attributes to assess the evidence of a variant. This type of visual inspection can be an efficient and powerful tool for variant call validation, eliminating many false positives and aiding in confirmation of true findings (Figures S6 and S7). Many sequencing protocols produce reads from both ends (“paired ends”) of genomic fragments of known size distribution. IGV uses this information to color-code paired ends if their insert sizes are larger than expected, fall on different chromosomes, or have unexpected pair orientations. Such pairs, when consistent across multiple reads, can be indicative of a genomic rearrangement. When coloring aberrant paired ends, each chromosome is assigned a unique color, so that intra- (same color) and inter- (different color) chromosomal events are readily distinguished (Figures 2 and S8). We note that misalignments, particularly in repeat regions, can also yield unexpected insert sizes, and can be diagnosed with the IGV (Figure S9). There are a number of stand-alone, desktop genome browsers available today 6 including Artemis 7 , EagleView 8 , MapView 9 , Tablet 10 , Savant 11 , Apollo 12 , and the Integrated Genome Browser 13 . Many of them have features that overlap with IGV, particularly for NGS sequence alignment and genome annotation viewing. The Integrated Genome Browser also supports viewing array-based data. See Supplementary Table 1 and Supplementary Text for more detail. IGV focuses on the emerging integrative nature of genomic studies, placing equal emphasis on array-based platforms, such as expression and copy-number arrays, next-generation sequencing, as well as clinical and other sample metadata. Indeed, an important and unique feature of IGV is the ability to view all these different data types together and to use the sample metadata to dynamically group, sort, and filter datasets (Figure 1 above). Another important characteristic of IGV is fast data loading and real-time pan and zoom – at all scales of genome resolution and all dataset sizes, including datasets comprising hundreds of samples. Finally, we have placed great emphasis on the ease of installation and use of IGV, with the goal of making both the viewing and sharing of their data accessible to non-informatics end users. IGV is open source software and freely available at http://www.broadinstitute.org/igv/, including full documentation on use of the software. Supplementary Material 1

0 comments Cited 3317 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Thomas Wicker: wicker@botinst.uzh.ch

Beat Keller: bkeller@botinst.uzh.ch

Journal

Journal ID (nlm-ta): BMC Evol Biol

Journal ID (iso-abbrev): BMC Evol. Biol

Title: BMC Evolutionary Biology

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2148

Publication date (Electronic): 31 October 2017

Publication date PMC-release: 31 October 2017

Publication date Collection: 2017

Volume: 17

Electronic Location Identifier: 223

Affiliations

ISNI 0000 0004 1937 0650, GRID grid.7400.3, Department of Plant and Microbial Biology, , University of Zürich, ; Zollikerstrasse 107, 8008 Zürich, Switzerland

Author information

Fabrizio Menardo http://orcid.org/0000-0002-7885-4482

Article

Publisher ID: 1064

DOI: 10.1186/s12862-017-1064-2

PMC ID: 5664452

PubMed ID: 29089018

SO-VID: a2ce410f-c82e-4832-96d5-08eeaf06df5e

License:

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 29 March 2017

Date accepted : 2 October 2017

Funding

Funded by: FundRef http://dx.doi.org/10.13039/501100001711, Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung;

Award ID: 310030-163260

Award Recipient : Beat Keller

Custom metadata

ScienceOpen disciplines: Evolutionary Biology

Keywords: blumeria graminis,powdery mildew,effectors

Data availability:

ScienceOpen disciplines: Evolutionary Biology

Keywords: blumeria graminis, powdery mildew, effectors

Comments

Comment on this article

scite_

Cited by 23

See all cited by

Most referenced authors 1,714

See all reference authors

- Version 1

Rapid turnover of effectors in grass powdery mildew ( Blumeria graminis)

Read this article at

Abstract

Background

Results

Conclusions

Electronic supplementary material

Related collections

cGMP: Generators, Effectors and Therapeutic Implications Conference

Most cited references 61

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies

MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Integrative Genomics Viewer

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 109

Cited by 23

Most referenced authors 1,714