BioBin: a bioinformatics tool for automating the binning of rare variants using publicly available biological knowledge

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

With the recent decreasing cost of genome sequence data, there has been increasing interest in rare variants and methods to detect their association to disease. We developed BioBin, a flexible collapsing method inspired by biological knowledge that can be used to automate the binning of low frequency variants for association testing. We also built the Library of Knowledge Integration (LOKI), a repository of data assembled from public databases, which contains resources such as: dbSNP and gene Entrez database information from the National Center for Biotechnology (NCBI), pathway information from Gene Ontology (GO), Protein families database (Pfam), Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, NetPath - signal transduction pathways, Open Regulatory Annotation Database (ORegAnno), Biological General Repository for Interaction Datasets (BioGrid), Pharmacogenomics Knowledge Base (PharmGKB), Molecular INTeraction database (MINT), and evolutionary conserved regions (ECRs) from UCSC Genome Browser. The novelty of BioBin is access to comprehensive knowledge-guided multi-level binning. For example, bin boundaries can be formed using genomic locations from: functional regions, evolutionary conserved regions, genes, and/or pathways.

Methods

We tested BioBin using simulated data and 1000 Genomes Project low coverage data to test our method with simulated causative variants and a pairwise comparison of rare variant (MAF < 0.03) burden differences between Yoruba individuals (YRI) and individuals of European descent (CEU). Lastly, we analyzed the NHLBI GO Exome Sequencing Project Kabuki dataset, a congenital disorder affecting multiple organs and often intellectual disability, contrasted with Complete Genomics data as controls.

Results

The results from our simulation studies indicate type I error rate is controlled, however, power falls quickly for small sample sizes using variants with modest effect sizes. Using BioBin, we were able to find simulated variants in genes with less than 20 loci, but found the sensitivity to be much less in large bins. We also highlighted the scale of population stratification between two 1000 Genomes Project data, CEU and YRI populations. Lastly, we were able to apply BioBin to natural biological data from dbGaP and identify an interesting candidate gene for further study.

Conclusions

We have established that BioBin will be a very practical and flexible tool to analyze sequence data and potentially uncover novel associations between low frequency variants and complex disease.

Related collections

Most cited references 16

Record: found
Abstract: found
Article: found

Is Open Access

The BioGRID Interaction Database: 2011 update

Chris Stark, Bobby-Joe Breitkreutz, Andrew Chatr-aryamontri … (2010)

The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (http://www.thebiogrid.org). BioGRID currently holds 347 966 interactions (170 162 genetic, 177 804 protein) curated from both high-throughput data sets and individual focused studies, as derived from over 23 000 publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe) and thale cress (Arabidopsis thaliana), and efforts to expand curation across multiple metazoan species are underway. The BioGRID houses 48 831 human protein interactions that have been curated from 10 247 publications. Current curation drives are focused on particular areas of biology to enable insights into conserved networks and pathways that are relevant to human health. The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources. An automated Interaction Management System (IMS) is used to prioritize, coordinate and track curation across international sites and projects. BioGRID provides interaction data to several model organism databases, resources such as Entrez-Gene and other interaction meta-databases. The entire BioGRID 3.0 data collection may be downloaded in multiple file formats, including PSI MI XML. Source code for BioGRID 3.0 is freely available without any restrictions.

0 comments Cited 342 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people.

Matthew R. Nelson, Daniel Wegmann, Margaret G Ehm … (2012)

Rare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (1 every 17 bases) and geographically localized, so that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. We conclude that because of rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk.

0 comments Cited 256 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Variation in the mutation rate across mammalian genomes.

Alan Hodgkinson, Adam Eyre-Walker (2011)

It has been known for many years that the mutation rate varies across the genome. However, only with the advent of large genomic data sets is the full extent of this variation becoming apparent. The mutation rate varies over many different scales, from adjacent sites to whole chromosomes, with the strongest variation seen at the smallest scales. Some of these patterns have clear mechanistic bases, but much of the rate variation remains unexplained, and some of it is deeply perplexing. Variation in the mutation rate has important implications in evolutionary biology and underexplored implications for our understanding of hereditary disease and cancer.

0 comments Cited 214 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Carrie B Moore

John R Wallace

Alex T Frase

Sarah A Pendergrass

Marylyn D Ritchie

Conference

Journal ID (nlm-ta): BMC Med Genomics

Journal ID (iso-abbrev): BMC Med Genomics

Title: BMC Medical Genomics

Publisher: BioMed Central

ISSN (Electronic): 1755-8794

Publication date Collection: 2013

Publication date (Electronic): 7 May 2013

Volume: 6

Issue: Suppl 2

Page: S6

Affiliations

[1 ]Center for Human Genetics Research, Vanderbilt University, Nashville, TN 37232, USA

[2 ]Center for Systems Genomics, Pennsylvania State University, University Park, PA 16802, USA

Article

Publisher ID: 1755-8794-6-S2-S6

DOI: 10.1186/1755-8794-6-S2-S6

PMC ID: 3654874

PubMed ID: 23819467

SO-VID: 9222f53f-38ef-41c3-93e7-3ab2572d7776

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Conference name: Second Annual Translational Bioinformatics Conference (TBC 2012)

BioBin: a bioinformatics tool for automating the binning of rare variants using publicly available biological knowledge

Read this article at

Abstract

Background

Methods

Results

Conclusions

Related collections

Genome Engineering using CRISPR

Most cited references 16

The BioGRID Interaction Database: 2011 update

An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people.

Variation in the mutation rate across mammalian genomes.

Author and article information

Contributors

Conference

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 57

Cited by 16

Most referenced authors 1,609