<p class="first" id="d5563738e183">The genetic causes of many Mendelian disorders
remain undefined. Factors such as lack
of large multiplex families, locus heterogeneity, and incomplete penetrance hamper
these efforts for many disorders. Previous work suggests that gene-based burden testing—where
the aggregate burden of rare, protein-altering variants in each gene is compared between
case and control subjects—might overcome some of these limitations. The increasing
availability of large-scale public sequencing databases such as Genome Aggregation
Database (gnomAD) can enable burden testing using these databases as controls, obviating
the need for additional control sequencing for each study. However, there exist various
challenges with using public databases as controls, including lack of individual-level
data, differences in ancestry, and differences in sequencing platforms and data processing.
To illustrate the approach of using public data as controls, we analyzed whole-exome
sequencing data from 393 individuals with idiopathic hypogonadotropic hypogonadism
(IHH), a rare disorder with significant locus heterogeneity and incomplete penetrance
against control subjects from gnomAD (n = 123,136). We leveraged presumably benign
synonymous variants to calibrate our approach. Through iterative analyses, we systematically
addressed and overcame various sources of artifact that can arise when using public
control data. In particular, we introduce an approach for highly adaptable variant
quality filtering that leads to well-calibrated results. Our approach “re-discovered”
genes previously implicated in IHH (
<i>FGFR1</i>,
<i>TACR3</i>,
<i>GNRHR</i>). Furthermore, we identified a significant burden in
<i>TYRO3</i>, a gene implicated in hypogonadotropic hypogonadism in mice. Finally,
we developed
a user-friendly software package TRAPD (Test Rare vAriants with Public Data) for performing
gene-based burden testing against public databases.
</p>