286
views
0
recommends
+1 Recommend
0 collections
    4
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST)

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In 2004, the SEED ( http://pubseed.theseed.org/) was created to provide consistent and accurate genome annotations across thousands of genomes and as a platform for discovering and developing de novo annotations. The SEED is a constantly updated integration of genomic data with a genome database, web front end, API and server scripts. It is used by many scientists for predicting gene functions and discovering new pathways. In addition to being a powerful database for bioinformatics research, the SEED also houses subsystems (collections of functionally related protein families) and their derived FIGfams (protein families), which represent the core of the RAST annotation engine ( http://rast.nmpdr.org/). When a new genome is submitted to RAST, genes are called and their annotations are made by comparison to the FIGfam collection. If the genome is made public, it is then housed within the SEED and its proteins populate the FIGfam collection. This annotation cycle has proven to be a robust and scalable solution to the problem of annotating the exponentially increasing number of genomes. To date, >12 000 users worldwide have annotated >60 000 distinct genomes using RAST. Here we describe the interconnectedness of the SEED database and RAST, the RAST annotation pipeline and updates to both resources.

          Related collections

          Most cited references22

          • Record: found
          • Abstract: found
          • Article: not found

          High-throughput generation, optimization and analysis of genome-scale metabolic models.

          Genome-scale metabolic models have proven to be valuable for predicting organism phenotypes from genotypes. Yet efforts to develop new models are failing to keep pace with genome sequencing. To address this problem, we introduce the Model SEED, a web-based resource for high-throughput generation, optimization and analysis of genome-scale metabolic models. The Model SEED integrates existing methods and introduces techniques to automate nearly every step of this process, taking approximately 48 h to reconstruct a metabolic model from an assembled genome sequence. We apply this resource to generate 130 genome-scale metabolic models representing a taxonomically diverse set of bacteria. Twenty-two of the models were validated against available gene essentiality and Biolog data, with the average model accuracy determined to be 66% before optimization and 87% after optimization.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Update on activities at the Universal Protein Resource (UniProt) in 2013

            The mission of the Universal Protein Resource (UniProt) (http://www.uniprot.org) is to support biological research by providing a freely accessible, stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase. It integrates, interprets and standardizes data from numerous resources to achieve the most comprehensive catalogue of protein sequences and functional annotation. UniProt comprises four major components, each optimized for different uses, the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is produced by the UniProt Consortium, which consists of groups from the European Bioinformatics Institute (EBI), the SIB Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is updated and distributed every 4 weeks and can be accessed online for searches or downloads.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              CDD: conserved domains and protein three-dimensional structure

              CDD, the Conserved Domain Database, is part of NCBI’s Entrez query and retrieval system and is also accessible via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml. CDD provides annotation of protein sequences with the location of conserved domain footprints and functional sites inferred from these footprints. Pre-computed annotation is available via Entrez, and interactive search services accept single protein or nucleotide queries, as well as batch submissions of protein query sequences, utilizing RPS-BLAST to rapidly identify putative matches. CDD incorporates several protein domain and full-length protein model collections, and maintains an active curation effort that aims at providing fine grained classifications for major and well-characterized protein domain families, as supported by available protein three-dimensional (3D) structure and the published literature. To this date, the majority of protein 3D structures are represented by models tracked by CDD, and CDD curators are characterizing novel families that emerge from protein structure determination efforts.
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                January 2014
                29 November 2013
                29 November 2013
                : 42
                : D1 , Database issue
                : D206-D214
                Affiliations
                1Fellowship for Interpretation of Genomes, Burr Ridge, IL 60527, USA, 2Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA, 3Computation Institute, University of Chicago, Chicago, IL 60637, USA, 4Department of Microbiology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA, 5Department of Computer Science, San Diego State University, San Diego, CA 92182, USA, 6Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24060, USA, 7Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, IL 60439, USA and 8Department of Computer Science, University of Chicago, Chicago, IL 60637, USA
                Author notes
                *To whom correspondence should be addressed. Tel: +1 630 325 4178; Fax: +1 630 325 4179; Email: veronika@ 123456thefig.info
                Article
                gkt1226
                10.1093/nar/gkt1226
                3965101
                24293654
                6fb4e9b3-4b8b-4a6f-814d-215c23d38aef
                © The Author(s) 2013. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 3 October 2013
                : 4 November 2013
                : 5 November 2013
                Page count
                Pages: 9
                Categories
                II. Protein sequence and structure, motifs and domains
                Custom metadata
                1 January 2014

                Genetics
                Genetics

                Comments

                Comment on this article