67
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Gene discovery in EST sequences from the wheat leaf rust fungus Puccinia triticina sexual spores, asexual spores and haustoria, compared to other rust and corn smut fungi

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Rust fungi are biotrophic basidiomycete plant pathogens that cause major diseases on plants and trees world-wide, affecting agriculture and forestry. Their biotrophic nature precludes many established molecular genetic manipulations and lines of research. The generation of genomic resources for these microbes is leading to novel insights into biology such as interactions with the hosts and guiding directions for breakthrough research in plant pathology.

          Results

          To support gene discovery and gene model verification in the genome of the wheat leaf rust fungus, Puccinia triticina ( Pt), we have generated Expressed Sequence Tags (ESTs) by sampling several life cycle stages. We focused on several spore stages and isolated haustorial structures from infected wheat, generating 17,684 ESTs. We produced sequences from both the sexual (pycniospores, aeciospores and teliospores) and asexual (germinated urediniospores) stages of the life cycle. From pycniospores and aeciospores, produced by infecting the alternate host, meadow rue ( Thalictrum speciosissimum), 4,869 and 1,292 reads were generated, respectively. We generated 3,703 ESTs from teliospores produced on the senescent primary wheat host. Finally, we generated 6,817 reads from haustoria isolated from infected wheat as well as 1,003 sequences from germinated urediniospores. Along with 25,558 previously generated ESTs, we compiled a database of 13,328 non-redundant sequences (4,506 singlets and 8,822 contigs). Fungal genes were predicted using the EST version of the self-training GeneMarkS algorithm. To refine the EST database, we compared EST sequences by BLASTN to a set of 454 pyrosequencing-generated contigs and Sanger BAC-end sequences derived both from the Pt genome, and to ESTs and genome reads from wheat. A collection of 6,308 fungal genes was identified and compared to sequences of the cereal rusts, Puccinia graminis f. sp. tritici ( Pgt) and stripe rust, P. striiformis f. sp. tritici ( Pst), and poplar leaf rust Melampsora species, and the corn smut fungus, Ustilago maydis ( Um). While extensive homologies were found, many genes appeared novel and species-specific; over 40% of genes did not match any known sequence in existing databases. Focusing on spore stages, direct comparison to Um identified potential functional homologs, possibly allowing heterologous functional analysis in that model fungus. Many potentially secreted protein genes were identified by similarity searches against genes and proteins of Pgt and Melampsora spp., revealing apparent orthologs.

          Conclusions

          The current set of Pt unigenes contributes to gene discovery in this major cereal pathogen and will be invaluable for gene model verification in the genome sequence.

          Related collections

          Most cited references46

          • Record: found
          • Abstract: not found
          • Article: not found

          GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.

          J Besemer (2001)
          Improving the accuracy of prediction of gene starts is one of a few remaining open problems in computer prediction of prokaryotic genes. Its difficulty is caused by the absence of relatively strong sequence patterns identifying true translation initiation sites. In the current paper we show that the accuracy of gene start prediction can be improved by combining models of protein-coding and non-coding regions and models of regulatory sites near gene start within an iterative Hidden Markov model based algorithm. The new gene prediction method, called GeneMarkS, utilizes a non-supervised training procedure and can be used for a newly sequenced prokaryotic genome with no prior knowledge of any protein or rRNA genes. The GeneMarkS implementation uses an improved version of the gene finding program GeneMark.hmm, heuristic Markov models of coding and non-coding regions and the Gibbs sampling multiple alignment program. GeneMarkS predicted precisely 83.2% of the translation starts of GenBank annotated Bacillus subtilis genes and 94.4% of translation starts in an experimentally validated set of Escherichia coli genes. We have also observed that GeneMarkS detects prokaryotic genes, in terms of identifying open reading frames containing real genes, with an accuracy matching the level of the best currently used gene detection methods. Accurate translation start prediction, in addition to the refinement of protein sequence N-terminal data, provides the benefit of precise positioning of the sequence region situated upstream to a gene start. Therefore, sequence motifs related to transcription and translation regulatory sites can be revealed and analyzed with higher precision. These motifs were shown to possess a significant variability, the functional and evolutionary connections of which are discussed.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Consed: a graphical tool for sequence finishing.

            Sequencing of large clones or small genomes is generally done by the shotgun approach (Anderson et al. 1982). This has two phases: (1) a shotgun phase in which a number of reads are generated from random subclones and assembled into contigs, followed by (2) a directed, or finishing phase in which the assembly is inspected for correctness and for various kinds of data anomalies (such as contaminant reads, unremoved vector sequence, and chimeric or deleted reads), additional data are collected to close gaps and resolve low quality regions, and editing is performed to correct assembly or base-calling errors. Finishing is currently a bottleneck in large-scale sequencing efforts, and throughput gains will depend both on reducing the need for human intervention and making it as efficient as possible. We have developed a finishing tool, consed, which attempts to implement these principles. A distinguishing feature relative to other programs is the use of error probabilities from our programs phred and phrap as an objective criterion to guide the entire finishing process. More information is available at http:// www.genome.washington.edu/consed/consed. html.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              GeneMark.hmm: new solutions for gene finding.

              The number of completely sequenced bacterial genomes has been growing fast. There are computer methods available for finding genes but yet there is a need for more accurate algorithms. The GeneMark. hmm algorithm presented here was designed to improve the gene prediction quality in terms of finding exact gene boundaries. The idea was to embed the GeneMark models into naturally derived hidden Markov model framework with gene boundaries modeled as transitions between hidden states. We also used the specially derived ribosome binding site pattern to refine predictions of translation initiation codons. The algorithm was evaluated on several test sets including 10 complete bacterial genomes. It was shown that the new algorithm is significantly more accurate than GeneMark in exact gene prediction. Interestingly, the high gene finding accuracy was observed even in the case when Markov models of order zero, one and two were used. We present the analysis of false positive and false negative predictions with the caution that these categories are not precisely defined if the public database annotation is used as a control.
                Bookmark

                Author and article information

                Journal
                BMC Genomics
                BMC Genomics
                BioMed Central
                1471-2164
                2011
                24 March 2011
                : 12
                : 161
                Affiliations
                [1 ]Pacific Agri-Food Research Centre, Agriculture & Agri-Food Canada, Summerland, BC, V0H 1Z0, Canada
                [2 ]USDA-ARS-HWWGRU, Manhattan, KS 66506, USA
                [3 ]School of Biosciences, University of Nottingham, Loughborough, UK
                [4 ]Dept. Biomedical Engineering and Computational Science and Engineering Division, Georgia Institute of Technology, Atlanta, GA 30332-0535, USA
                [5 ]Environmental & Life Sciences Graduate Program, Trent University, Peterborough, ON, K9J 7B8, Canada
                [6 ]Institute for Cereal Crops Improvement, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
                [7 ]Cereal Research Centre, Agriculture & Agri-Food Canada, Winnipeg, MB, R3T 2M9, Canada
                [8 ]Michael Smith Genome Sciences Centre, Vancouver, BC, V5Z 4E6, Canada
                [9 ]Forensic Science Program Trent University, Peterborough, ON, K9J 7B8, Canada
                [10 ]Bee Biology & Systematics Laboratory, USDA-ARS, N. Logan, UT 84341, USA
                Article
                1471-2164-12-161
                10.1186/1471-2164-12-161
                3074555
                21435244
                3f6e2d52-8396-4bc5-b892-3ba388d40c96
                Copyright ©2011 Xu et al; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 6 October 2010
                : 24 March 2011
                Categories
                Research Article

                Genetics
                Genetics

                Comments

                Comment on this article