A central challenge in interpreting personal genomes is determining which mutations most likely influence disease. Although progress has been made in scoring the functional impact of individual mutations, the characteristics of the genes in which those mutations are found remain largely unexplored. For example, genes known to carry few common functional variants in healthy individuals may be judged more likely to cause certain kinds of disease than genes known to carry many such variants. Until now, however, it has not been possible to develop a quantitative assessment of how well genes tolerate functional genetic variation on a genome-wide scale. Here we describe an effort that uses sequence data from 6503 whole exome sequences made available by the NHLBI Exome Sequencing Project (ESP). Specifically, we develop an intolerance scoring system that assesses whether genes have relatively more or less functional genetic variation than expected based on the apparently neutral variation found in the gene. To illustrate the utility of this intolerance score, we show that genes responsible for Mendelian diseases are significantly more intolerant to functional genetic variation than genes that do not cause any known disease, but with striking variation in intolerance among genes causing different classes of genetic disease. We conclude by showing that use of an intolerance ranking system can aid in interpreting personal genomes and identifying pathogenic mutations.
This work uses empirical single nucleotide variant data from the NHLBI Exome Sequencing Project to introduce a genome-wide scoring system that ranks human genes in terms of their intolerance to standing functional genetic variation in the human population. It is often inferred that genes carrying relatively fewer or relatively more common functional variants in healthy individuals may be judged respectively more or less likely to cause certain kinds of disease. We show that this intolerance score correlates remarkably well with genes already known to cause Mendelian diseases (P<10 −26). Equally striking, however, are the differences in the relationship between standing genetic variation and disease causing genes for different disease types. Considering disorder classes defined by Goh et al (2007) human disease network, we show a nearly opposite pattern for genes linked to developmental disorders and those linked to immunological disorders, with the former being preferentially caused by genes that do not tolerate functional variation and the latter caused by genes with an excess of common functional variation. We conclude by showing that use of an intolerance ranking system can facilitate interpreting personal genomes and can facilitate identifying high impact mutations through the gene in which they occur.