Characterising and Predicting Haploinsufficiency in the Human Genome

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Haploinsufficiency, wherein a single functional copy of a gene is insufficient to maintain normal function, is a major cause of dominant disease. Human disease studies have identified several hundred haploinsufficient (HI) genes. We have compiled a map of 1,079 haplosufficient (HS) genes by systematic identification of genes unambiguously and repeatedly compromised by copy number variation among 8,458 apparently healthy individuals and contrasted the genomic, evolutionary, functional, and network properties between these HS genes and known HI genes. We found that HI genes are typically longer and have more conserved coding sequences and promoters than HS genes. HI genes exhibit higher levels of expression during early development and greater tissue specificity. Moreover, within a probabilistic human functional interaction network HI genes have more interaction partners and greater network proximity to other known HI genes. We built a predictive model on the basis of these differences and annotated 12,443 genes with their predicted probability of being haploinsufficient. We validated these predictions of haploinsufficiency by demonstrating that genes with a high predicted probability of exhibiting haploinsufficiency are enriched among genes implicated in human dominant diseases and among genes causing abnormal phenotypes in heterozygous knockout mice. We have transformed these gene-based haploinsufficiency predictions into haploinsufficiency scores for genic deletions, which we demonstrate to better discriminate between pathogenic and benign deletions than consideration of the deletion size or numbers of genes deleted. These robust predictions of haploinsufficiency support clinical interpretation of novel loss-of-function variants and prioritization of variants and genes for follow-up studies.

Author Summary

Humans, like most complex organisms, have two copies of most genes in their genome, one from the mother and one from the father. This redundancy provides a back-up copy for most genes, should one copy be lost through mutation. For a minority of genes, one functional copy is not enough to sustain normal human function, and mutations causing the loss of function of one of the copies of such genes are a major cause of childhood developmental diseases. Over the past 20 years medical geneticists have identified over 300 such genes, but it is not known how many of the 22,000 genes in our genome may also be sensitive to gene loss. By comparing these ∼300 genes known to be sensitive to gene loss with over 1,000 genes where loss of a single copy does not result in disease, we have identified some key evolutionary and functional similarities between genes sensitive to loss of a single copy. We have used these similarities to predict for most genes in the genome, whether loss of a single copy is likely to result in disease. These predictions will help in the interpretation of mutations seen in patients.

Related collections

Most cited references 26

Record: found
Abstract: found
Article: not found

Ensembl 2009

T. J. P. Hubbard, B. L. Aken, S. Ayling … (2009)

The Ensembl project (http://www.ensembl.org) is a comprehensive genome information system featuring an integrated set of genome annotation, databases, and other information for chordate, selected model organism and disease vector genomes. As of release 51 (November 2008), Ensembl fully supports 45 species, and three additional species have preliminary support. New species in the past year include orangutan and six additional low coverage mammalian genomes. Major additions and improvements to Ensembl since our previous report include a major redesign of our website; generation of multiple genome alignments and ancestral sequences using the new Enredo-Pecan-Ortheus pipeline and development of our software infrastructure, particularly to support the Ensembl Genomes project (http://www.ensemblgenomes.org/).

0 comments Cited 366 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

MINT: the Molecular INTeraction database

Andrew Chatr-aryamontri, Arnaud Ceol, Luisa Palazzi … (2006)

The Molecular INTeraction database (MINT, ) aims at storing, in a structured format, information about molecular interactions (MIs) by extracting experimental details from work published in peer-reviewed journals. At present the MINT team focuses the curation work on physical interactions between proteins. Genetic or computationally inferred interactions are not included in the database. Over the past four years MINT has undergone extensive revision. The new version of MINT is based on a completely remodeled database structure, which offers more efficient data exploration and analysis, and is characterized by entries with a richer annotation. Over the past few years the number of curated physical interactions has soared to over 95 000. The whole dataset can be freely accessed online in both interactive and batch modes through web-based interfaces and an FTP server. MINT now includes, as an integrated addition, HomoMINT, a database of interactions between human proteins inferred from experiments with ortholog proteins in model organisms ().

0 comments Cited 323 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Integrated detection and population-genetic analysis of SNPs and copy number variation.

Steven A McCarroll, Finny Kuruvilla, Joshua M. Korn … (2008)

Dissecting the genetic basis of disease risk requires measuring all forms of genetic variation, including SNPs and copy number variants (CNVs), and is enabled by accurate maps of their locations, frequencies and population-genetic properties. We designed a hybrid genotyping array (Affymetrix SNP 6.0) to simultaneously measure 906,600 SNPs and copy number at 1.8 million genomic locations. By characterizing 270 HapMap samples, we developed a map of human CNV (at 2-kb breakpoint resolution) informed by integer genotypes for 1,320 copy number polymorphisms (CNPs) that segregate at an allele frequency >1%. More than 80% of the sequence in previously reported CNV regions fell outside our estimated CNV boundaries, indicating that large (>100 kb) CNVs affect much less of the genome than initially reported. Approximately 80% of observed copy number differences between pairs of individuals were due to common CNPs with an allele frequency >5%, and more than 99% derived from inheritance rather than new mutation. Most common, diallelic CNPs were in strong linkage disequilibrium with SNPs, and most low-frequency CNVs segregated on specific SNP haplotypes.

0 comments Cited 321 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

: Role: Editor

Journal

Journal ID (nlm-ta): PLoS Genet

Journal ID (publisher-id): plos

Journal ID (pmc): plosgen

Title: PLoS Genetics

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Print): 1553-7390

ISSN (Electronic): 1553-7404

Publication date Collection: October 2010

Publication date (Print): October 2010

Publication date (Electronic): 14 October 2010

Volume: 6

Issue: 10

Electronic Location Identifier: e1001154

Affiliations

[1 ]Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom

[2 ]Center for Systems and Synthetic Biology, Department of Chemistry and Biochemistry, Institute for Cellular and Molecular Biology, University of Texas, Austin, Texas, United States of America

[3 ]Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, South Korea

University of Aarhus, Denmark

Author notes

* E-mail: meh@ 123456sanger.ac.uk

Conceived and designed the experiments: NH MEH. Analyzed the data: NH. Contributed reagents/materials/analysis tools: IL EMM. Wrote the paper: NH MEH.

Article

Publisher ID: 10-PLGE-RA-2311R2

DOI: 10.1371/journal.pgen.1001154

PMC ID: 2954820

PubMed ID: 20976243

SO-VID: a99e1892-d786-4a3d-beaf-ee27ce456c04

Copyright © Huang et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 6 January 2010

Date accepted : 10 September 2010

Page count

Pages: 11

Comments

Comment on this article

scite_

Cited by 245

See all cited by

Most referenced authors 1,698

See all reference authors

Characterising and Predicting Haploinsufficiency in the Human Genome

Read this article at

Abstract

Author Summary

Related collections

Genome Integrity

Most cited references 26

Ensembl 2009

MINT: the Molecular INTeraction database

Integrated detection and population-genetic analysis of SNPs and copy number variation.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 59

Cited by 245

Most referenced authors 1,698