RefSeq: an update on mammalian reference sequences

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration ( http://www.ncbi.nlm.nih.gov/refseq/). We report here on growth of the mammalian and human subsets, changes to NCBI’s eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI’s eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.

Related collections

Most cited references 15

Record: found
Abstract: found
Article: not found

Update on activities at the Universal Protein Resource (UniProt) in 2013

Claire O'Donovan, emmanuel boutet (2012)

The mission of the Universal Protein Resource (UniProt) (http://www.uniprot.org) is to support biological research by providing a freely accessible, stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase. It integrates, interprets and standardizes data from numerous resources to achieve the most comprehensive catalogue of protein sequences and functional annotation. UniProt comprises four major components, each optimized for different uses, the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is produced by the UniProt Consortium, which consists of groups from the European Bioinformatics Institute (EBI), the SIB Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is updated and distributed every 4 weeks and can be accessed online for searches or downloads.

0 comments Cited 326 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

McKusick's Online Mendelian Inheritance in Man (OMIM®)

Joanna Amberger, Carol A. Bocchini, Alan F. Scott … (2009)

McKusick's Online Mendelian Inheritance in Man (OMIM®; http://www.ncbi.nlm.nih.gov/omim), a knowledgebase of human genes and phenotypes, was originally published as a book, Mendelian Inheritance in Man, in 1966. The content of OMIM is derived exclusively from the published biomedical literature and is updated daily. It currently contains 18 961 full-text entries describing phenotypes and genes. To date, 2239 genes have mutations causing disease, and 3770 diseases have a molecular basis. Approximately 70 new entries are added and 700 entries are updated per month. OMIM® is expanding content and organization in response to shifting biological paradigms and advancing biotechnology.

0 comments Cited 297 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.

Kim D. Pruitt, Jennifer Harrow, Rachel A. Harte … (2009)

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.

0 comments Cited 263 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Nucleic Acids Res

Journal ID (iso-abbrev): Nucleic Acids Res

Journal ID (publisher-id): nar

Journal ID (hwp): nar

Title: Nucleic Acids Research

Publisher: Oxford University Press

ISSN (Print): 0305-1048

ISSN (Electronic): 1362-4962

Publication date (Print): January 2014

Publication date (Electronic): 19 November 2013

Publication date PMC-release: 19 November 2013

Volume: 42

Issue: D1 , Database issue

Pages: D756-D763

Affiliations

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA

Author notes

*To whom correspondence should be addressed. Tel: +1 301 435 5898; Fax: +1 301 435 5898; Email: pruitt@ 123456ncbi.nlm.nih.gov

Article

Publisher ID: gkt1114

DOI: 10.1093/nar/gkt1114

PMC ID: 3965018

PubMed ID: 24259432

SO-VID: 1fa874de-c677-4130-9ae7-b14d90e165eb

History

Date received : 23 September 2013

Date revision received : 21 October 2013

Date accepted : 22 October 2013

Page count

Pages: 8

Custom metadata

cover-date 1 January 2014

ScienceOpen disciplines: Genetics

Data availability:

ScienceOpen disciplines: Genetics

Comments

Comment on this article

scite_

Cited by 450

See all cited by

Most referenced authors 1,132

See all reference authors

- Version 1

RefSeq: an update on mammalian reference sequences

Read this article at

Abstract

Related collections

European Journal of Microbiology and Immunology

Most cited references 15

Update on activities at the Universal Protein Resource (UniProt) in 2013

McKusick's Online Mendelian Inheritance in Man (OMIM®)

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Custom metadata

Comments

Comment on this article

Similar content 113

Cited by 450

Most referenced authors 1,132