dbNSFP: A Lightweight Database of Human Nonsynonymous SNPs and Their Functional Predictions

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

With the advance of sequencing technologies, whole exome sequencing has increasingly been used to identify mutations that cause human diseases, especially rare Mendelian diseases. Among the analysis steps, functional prediction (of being deleterious) plays an important role in filtering or prioritizing nonsynonymous SNP (NS) for further analysis. Unfortunately, different prediction algorithms use different information and each has its own strength and weakness. It has been suggested that investigators should use predictions from multiple algorithms instead of relying on a single one. However, querying predictions from different databases/Web-servers for different algorithms is both tedious and time consuming, especially when dealing with a huge number of NSs identified by exome sequencing. To facilitate the process, we developed dbNSFP (database for nonsynonymous SNPs' functional predictions). It compiles prediction scores from four new and popular algorithms (SIFT, Polyphen2, LRT, and MutationTaster), along with a conservation score (PhyloP) and other related information, for every potential NS in the human genome (a total of 75,931,005). It is the first integrated database of functional predictions from multiple algorithms for the comprehensive collection of human NSs. dbNSFP is freely available for download at http://sites.google.com/site/jpopgen/dbNSFP. Hum Mutat 32:894–899, 2011. © 2011 Wiley-Liss, Inc.

Related collections

Most cited references 14

Record: found
Abstract: found
Article: not found

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.

Kim D. Pruitt, Jennifer Harrow, Rachel A. Harte … (2009)

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.

0 comments Cited 263 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

The UCSC Genome Browser database: update 2010

Brooke Rhead, Donna Karolchik, Robert M. Kuhn … (2010)

The University of California, Santa Cruz (UCSC) Genome Browser website (http://genome.ucsc.edu/) provides a large database of publicly available sequence and annotation data along with an integrated tool set for examining and comparing the genomes of organisms, aligning sequence to genomes, and displaying and sharing users’ own annotation data. As of September 2009, genomic sequence and a basic set of annotation ‘tracks’ are provided for 47 organisms, including 14 mammals, 10 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms and a yeast. New data highlights this year include an updated human genome browser, a 44-species multiple sequence alignment track, improved variation and phenotype tracks and 16 new genome-wide ENCODE tracks. New features include drag-and-zoom navigation, a Wiki track for user-added annotations, new custom track formats for large datasets (bigBed and bigWig), a new multiple alignment output tool, links to variation and protein structure tools, in silico PCR utility enhancements, and improved track configuration tools.

0 comments Cited 260 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A Bayesian missing value estimation method for gene expression profile data.

S. Oba, M.-a. Sato, I Takemasa … (2003)

Gene expression profile analyses have been used in numerous studies covering a broad range of areas in biology. When unreliable measurements are excluded, missing values are introduced in gene expression profiles. Although existing multivariate analysis methods have difficulty with the treatment of missing values, this problem has received little attention. There are many options for dealing with missing values, each of which reaches drastically different results. Ignoring missing values is the simplest method and is frequently applied. This approach, however, has its flaws. In this article, we propose an estimation method for missing values, which is based on Bayesian principal component analysis (BPCA). Although the methodology that a probabilistic model and latent variables are estimated simultaneously within the framework of Bayes inference is not new in principle, actual BPCA implementation that makes it possible to estimate arbitrary missing variables is new in terms of statistical methodology. When applied to DNA microarray data from various experimental conditions, the BPCA method exhibited markedly better estimation ability than other recently proposed methods, such as singular value decomposition and K-nearest neighbors. While the estimation performance of existing methods depends on model parameters whose determination is difficult, our BPCA method is free from this difficulty. Accordingly, the BPCA method provides accurate and convenient estimation for missing values. The software is available at http://hawaii.aist-nara.ac.jp/~shige-o/tools/.

0 comments Cited 152 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Hum Mutat

Journal ID (iso-abbrev): Hum. Mutat

Journal ID (publisher-id): humu

Title: Human Mutation

Publisher: Wiley Subscription Services, Inc., A Wiley Company (Hoboken )

ISSN (Print): 1059-7794

ISSN (Electronic): 1098-1004

Publication date (Print): August 2011

Publication date (Electronic): 21 April 2011

Volume: 32

Issue: 8

Pages: 894-899

Affiliations

simpleHuman Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston Houston, Texas

Author notes

*Correspondence to: Xiaoming Liu, Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, 1200 Herman Pressler Drive, E529, Houston, Texas 77030. E-mail: xiaoming.liu@ 123456uth.tmc.edu

Communicated by George Patrinos

Additional Supporting Information may be found in the online version of this article.

Contract grant sponsor: The National Institutes of Health; Contract grant numbers: RC2-HL02419-01; RC2 HL103010-01; 1U01HG005728-01.

Article

DOI: 10.1002/humu.21517

PMC ID: 3145015

PubMed ID: 21520341

SO-VID: edc15eec-36f1-4396-8227-eb85567bafd7

License:

Re-use of this article is permitted in accordance with the Creative Commons Deed, Attribution 2.5, which does not permit commercial exploitation.

dbNSFP: A Lightweight Database of Human Nonsynonymous SNPs and Their Functional Predictions

Read this article at

Abstract

Related collections

Radiology and Natural Language Processing

Most cited references 14

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.

The UCSC Genome Browser database: update 2010

A Bayesian missing value estimation method for gene expression profile data.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 370

Cited by 332

Most referenced authors 977