Blog
About

  • Record: found
  • Abstract: found
  • Article: not found

PubChem 2019 update: improved access to chemical data

Read this article at

ScienceOpenPublisherPMC
Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      PubChem (https://pubchem.ncbi.nlm.nih.gov) is a key chemical information resource for the biomedical research community. Substantial improvements were made in the past few years. New data content was added, including spectral information, scientific articles mentioning chemicals, and information for food and agricultural chemicals. PubChem released new web interfaces, such as PubChem Target View page, Sources page, Bioactivity dyad pages and Patent View page. PubChem also released a major update to PubChem Widgets and introduced a new programmatic access interface, called PUG-View. This paper describes these new developments in PubChem.

      Related collections

      Most cited references 28

      • Record: found
      • Abstract: not found
      • Article: not found

      Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

        Bookmark
        • Record: found
        • Abstract: found
        • Article: found
        Is Open Access

        The Pfam protein families database: towards a more sustainable future

        In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.
          Bookmark
          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          UniProt: the universal protein knowledgebase

            (2016)
          The UniProt knowledgebase is a large resource of protein sequences and associated detailed annotation. The database contains over 60 million sequences, of which over half a million sequences have been curated by experts who critically review experimental and predicted data for each protein. The remainder are automatically annotated based on rule systems that rely on the expert curated knowledge. Since our last update in 2014, we have more than doubled the number of reference proteomes to 5631, giving a greater coverage of taxonomic diversity. We implemented a pipeline to remove redundant highly similar proteomes that were causing excessive redundancy in UniProt. The initial run of this pipeline reduced the number of sequences in UniProt by 47 million. For our users interested in the accessory proteomes, we have made available sets of pan proteome sequences that cover the diversity of sequences for each species that is found in its strains and sub-strains. To help interpretation of genomic variants, we provide tracks of detailed protein information for the major genome browsers. We provide a SPARQL endpoint that allows complex queries of the more than 22 billion triples of data in UniProt (http://sparql.uniprot.org/). UniProt resources can be accessed via the website at http://www.uniprot.org/.
            Bookmark

            Author and article information

            Affiliations
            National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
            Author notes
            To whom correspondence should be addressed. Tel: +1 301 451 1811; Fax: +1 301 480 4559; Email: bolton@ 123456ncbi.nlm.nih.gov
            Journal
            Nucleic Acids Res
            Nucleic Acids Res
            nar
            Nucleic Acids Research
            Oxford University Press
            0305-1048
            1362-4962
            08 January 2019
            29 October 2018
            29 October 2018
            : 47
            : Database issue , Database issue
            : D1102-D1109
            30371825
            6324075
            10.1093/nar/gky1033
            gky1033
            Published by Oxford University Press on behalf of Nucleic Acids Research 2018.

            This work is written by (a) US Government employee(s) and is in the public domain in the US.

            Counts
            Pages: 8
            Product
            Funding
            Funded by: National Institutes of Health 10.13039/100000002
            Categories
            Database Issue

            Genetics

            Comments

            Comment on this article