34
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      proGenomes2: an improved database for accurate and consistent habitat, taxonomic and functional annotations of prokaryotic genomes

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Microbiology depends on the availability of annotated microbial genomes for many applications. Comparative genomics approaches have been a major advance, but consistent and accurate annotations of genomes can be hard to obtain. In addition, newer concepts such as the pan-genome concept are still being implemented to help answer biological questions. Hence, we present proGenomes2, which provides 87 920 high-quality genomes in a user-friendly and interactive manner. Genome sequences and annotations can be retrieved individually or by taxonomic clade. Every genome in the database has been assigned to a species cluster and most genomes could be accurately assigned to one or multiple habitats. In addition, general functional annotations and specific annotations of antibiotic resistance genes and single nucleotide variants are provided. In short, proGenomes2 provides threefold more genomes, enhanced habitat annotations, updated taxonomic and functional annotation and improved linkage to the NCBI BioSample database. The database is available at http://progenomes.embl.de/.

          Related collections

          Most cited references17

          • Record: found
          • Abstract: found
          • Article: not found

          Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome".

          The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and also limits genome-wide screens for vaccine candidates or for antimicrobial targets. We have generated the genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans. Analysis of these genomes and those available in databases showed that the S. agalactiae species can be described by a pan-genome consisting of a core genome shared by all isolates, accounting for approximately 80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactiae pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The microbial pan-genome.

            A decade after the beginning of the genomic era, the question of how genomics can describe a bacterial species has not been fully addressed. Experimental data have shown that in some species new genes are discovered even after sequencing the genomes of several strains. Mathematical modeling predicts that new genes will be discovered even after sequencing hundreds of genomes per species. Therefore, a bacterial species can be described by its pan-genome, which is composed of a "core genome" containing genes present in all strains, and a "dispensable genome" containing genes present in two or more strains and genes unique to single strains. Given that the number of unique genes is vast, the pan-genome of a bacterial species might be orders of magnitude larger than any single genome.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.

              An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence (1,830,137 base pairs) of the genome from the bacterium Haemophilus influenzae Rd. This approach eliminates the need for initial mapping efforts and is therefore applicable to the vast array of microbial species for which genome maps are unavailable. The H. influenzae Rd genome sequence (Genome Sequence DataBase accession number L42023) represents the only complete genome sequence from a free-living organism.
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                08 January 2020
                24 October 2019
                24 October 2019
                : 48
                : D1
                : D621-D625
                Affiliations
                [1 ] Department of Medical Microbiology, Academic Medical Centre, University of Amsterdam , Amsterdam, The Netherlands
                [2 ] Biobyte solutions GmbH , Bothestr, 142, 69117 Heidelberg, Germany
                [3 ] Structural and Computational Biology Unit, European Molecular Biology Laboratory , 69117 Heidelberg, Germany
                [4 ] Institute of Microbiology, Department of Biology, ETH Zurich , Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland
                [5 ] Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM , 28223, Pozuelo de Alarcón, Madrid, Spain
                [6 ] Max Delbrück Centre for Molecular Medicine , 13125 Berlin, Germany
                [7 ] Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University , Shanghai, China
                [8 ] Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education , China
                [9 ] Molecular Medicine Partnership Unit, University of Heidelberg and European Molecular Biology Laboratory , 69120 Heidelberg, Germany
                [10 ] Department of Bioinformatics, Biocenter, University of Würzburg , 97074 Würzburg, Germany
                Author notes
                To whom correspondence should be addressed. Email: d.r.mende@ 123456amsterdamumc.nl
                Correspondence may also be addressed to Peer Bork. Email: bork@ 123456embl.de
                Author information
                http://orcid.org/0000-0001-6831-4557
                Article
                gkz1002
                10.1093/nar/gkz1002
                7145564
                31647096
                a98f2b97-c30b-49e7-8814-8afba62e41cf
                © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 18 October 2019
                : 15 October 2019
                : 15 September 2019
                Page count
                Pages: 5
                Funding
                Funded by: European Molecular Biology Laboratory 10.13039/100013060
                Funded by: European Research Council 10.13039/501100000781
                Award ID: ERC-2014-AdG
                Funded by: Heidelberg Center for Human Bioinformatics
                Award ID: de.NBI #031A537B
                Funded by: ETH Zürich 10.13039/501100003006
                Funded by: Helmut Horten Foundation
                Funded by: Fudan University 10.13039/501100003347
                Funded by: Shanghai Municipal Science and Technology
                Award ID: 2018SHZDZX01
                Funded by: ZHANGJIANG LAB
                Funded by: Consejería de Educación, Juventud y Deporte de la Comunidad de Madrid 10.13039/501100008433
                Funded by: Fondo Social Europeo 10.13039/501100004895
                Award ID: PEJ-2017-AI/TIC-7514
                Funded by: Ministerio de Ciencia, Innovación y Universidades 10.13039/100014440
                Award ID: PGC2018-098073-A-I00 MCIU/AEI/FEDER
                Funded by: Horizon 2020 10.13039/100010661
                Award ID: 686070
                Categories
                Database Issue

                Genetics
                Genetics

                Comments

                Comment on this article