UniProt: a worldwide hub of protein knowledge

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The UniProt Knowledgebase is a collection of sequences and annotations for over 120 million proteins across all branches of life. Detailed annotations extracted from the literature by expert curators have been collected for over half a million of these proteins. These annotations are supplemented by annotations provided by rule based automated systems, and those imported from other resources. In this article we describe significant updates that we have made over the last 2 years to the resource. We have greatly expanded the number of Reference Proteomes that we provide and in particular we have focussed on improving the number of viral Reference Proteomes. The UniProt website has been augmented with new data visualizations for the subcellular localization of proteins as well as their structure and interactions. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.

Related collections

Most cited references 31

Record: found
Abstract: found
Article: not found

A METTL3-METTL14 complex mediates mammalian nuclear RNA N6-adenosine methylation

Jianzhao Liu, Yanan Yue, Dali Han … (2013)

N 6-methyladenosine (m6A) is the most prevalent and reversible internal modification in mammalian messenger and non-coding RNAs. We report here that human METTL14 catalyzes m6A RNA methylation. Together with METTL3, the only previously known m6A methyltransferase, these two proteins form a stable heterodimer core complex of METTL3-14 that functions in cellular m6A deposition on mammalian nuclear RNAs. WTAP, a mammalian splicing factor, can interact with this complex and affect this methylation.

0 comments Cited 949 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

UniProt: the universal protein knowledgebase

(2016)

The UniProt knowledgebase is a large resource of protein sequences and associated detailed annotation. The database contains over 60 million sequences, of which over half a million sequences have been curated by experts who critically review experimental and predicted data for each protein. The remainder are automatically annotated based on rule systems that rely on the expert curated knowledge. Since our last update in 2014, we have more than doubled the number of reference proteomes to 5631, giving a greater coverage of taxonomic diversity. We implemented a pipeline to remove redundant highly similar proteomes that were causing excessive redundancy in UniProt. The initial run of this pipeline reduced the number of sequences in UniProt by 47 million. For our users interested in the accessory proteomes, we have made available sets of pan proteome sequences that cover the diversity of sequences for each species that is found in its strains and sub-strains. To help interpretation of genomic variants, we provide tracks of detailed protein information for the major genome browsers. We provide a SPARQL endpoint that allows complex queries of the more than 22 billion triples of data in UniProt (http://sparql.uniprot.org/). UniProt resources can be accessed via the website at http://www.uniprot.org/.

0 comments Cited 781 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Ensembl 2018

Daniel Zerbino, Premanand Achuthan, Wasiu Akanni … (2017)

Abstract The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. Large amounts of raw data are thus transformed into knowledge, which is made available via a multitude of channels, in particular our browser (http://www.ensembl.org). Over time, we have expanded in multiple directions. First, our resources describe multiple fields of genomics, in particular gene annotation, comparative genomics, genetics and epigenomics. Second, we cover a growing number of genome assemblies; Ensembl Release 90 contains exactly 100. Third, our databases feed simultaneously into an array of services designed around different use cases, ranging from quick browsing to genome-wide bioinformatic analysis. We present here the latest developments of the Ensembl project, with a focus on managing an increasing number of assemblies, supporting efforts in genome interpretation and improving our browser.

0 comments Cited 660 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Nucleic Acids Res

Journal ID (iso-abbrev): Nucleic Acids Res

Journal ID (publisher-id): nar

Title: Nucleic Acids Research

Publisher: Oxford University Press

ISSN (Print): 0305-1048

ISSN (Electronic): 1362-4962

Publication date (Print): 08 January 2019

Publication date (Electronic): 05 November 2018

Publication date PMC-release: 05 November 2018

Volume: 47

Issue: Database issue , Database issue

Pages: D506-D515

Affiliations

[1 ]European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK

[2 ]SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, CH-1211 Geneva 4, Switzerland

[3 ]Protein Information Resource, Georgetown University Medical Center, 3300 Whitehaven Street NW, Suite 1200, Washington, DC 20007, USA

[4 ]Protein Information Resource, University of Delaware, 15 Innovation Way, Suite 205, Newark DE 19711, USA

Author notes

To whom correspondence should be addressed. Tel: +44 1223 494 100; Fax: +44 1223 494 468; Email: agb@ 123456ebi.ac.uk

Present address: Alex Bateman, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

Article

Publisher ID: gky1049

DOI: 10.1093/nar/gky1049

PMC ID: 6323992

PubMed ID: 30395287

SO-VID: 42fd9466-139a-4a99-a103-bda97c18ecf3

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date accepted : 18 October 2018

Date revision received : 15 October 2018

Date received : 14 September 2018

Page count

Pages: 10

Funding

Funded by: National Institutes of Health 10.13039/100000002

Award ID: U24HG007822

Funded by: National Human Genome Research Institute 10.13039/100000051

Award ID: U41HG007822

Award ID: U41HG002273

Funded by: National Institute of General Medical Sciences 10.13039/100000057

Award ID: R01GM080646

Award ID: P20GM103446

Award ID: U01GM120953

Funded by: Biotechnology and Biological Sciences Research Council 10.13039/501100000268

Award ID: BB/M011674/1

Funded by: British Heart Foundation 10.13039/501100000274

Award ID: RG/13/5/30112

Comments

Comment on this article

scite_

Cited by 2,836

See all cited by

Most referenced authors 761

See all reference authors

- Version 1
- Version 1

UniProt: a worldwide hub of protein knowledge

Read this article at

Abstract

Related collections

Genomic Prediction

Most cited references 31

A METTL3-METTL14 complex mediates mammalian nuclear RNA N6-adenosine methylation

UniProt: the universal protein knowledgebase

Ensembl 2018

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 310

Cited by 2,836

Most referenced authors 761