3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Reference exome data for Australian Aboriginal populations to support health-based research

      data-paper

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Whole exome sequencing (WES) is a popular and successful technology which is widely used in both research and clinical settings. However, there is a paucity of reference data for Aboriginal Australians to underpin the translation of health-based genomic research. Here we provide a catalogue of variants called after sequencing the exomes of 50 Aboriginal individuals from the Northern Territory (NT) of Australia and compare these to 72 previously published exomes from a Western Australian (WA) population of Martu origin. Sequence data for both NT and WA samples were processed using an ‘intersect-then-combine’ (ITC) approach, using GATK and SAMtools to call variants. A total of 289,829 variants were identified in at least one individual in the NT cohort and 248,374 variants in at least one individual in the WA cohort. Of these, 166,719 variants were present in both cohorts, whilst 123,110 variants were private to the NT cohort and 81,655 were private to the WA cohort. Our data set provides a useful reference point for genomic studies on Aboriginal Australians.

          Abstract

          Measurement(s) Aboriginal Australian • DNA • sequence feature annotation
          Technology Type(s) Whole Exome Sequencing • DNA sequencing • sequence annotation
          Factor Type(s) ancestry • sex • age
          Sample Characteristic - Organism Homo sapiens
          Sample Characteristic - Location Northern Territory

          Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.12040638

          Related collections

          Most cited references13

          • Record: found
          • Abstract: found
          • Article: not found

          dbNSFP v3.0: A One-Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice-Site SNVs.

          The purpose of the dbNSFP is to provide a one-stop resource for functional predictions and annotations for human nonsynonymous single-nucleotide variants (nsSNVs) and splice-site variants (ssSNVs), and to facilitate the steps of filtering and prioritizing SNVs from a large list of SNVs discovered in an exome-sequencing study. A list of all potential nsSNVs and ssSNVs based on the human reference sequence were created and functional predictions and annotations were curated and compiled for each SNV. Here, we report a recent major update of the database to version 3.0. The SNV list has been rebuilt based on GENCODE 22 and currently the database includes 82,832,027 nsSNVs and ssSNVs. An attached database dbscSNV, which compiled all potential human SNVs within splicing consensus regions and their deleteriousness predictions, add another 15,030,459 potentially functional SNVs. Eleven prediction scores (MetaSVM, MetaLR, CADD, VEST3, PROVEAN, 4× fitCons, fathmm-MKL, and DANN) and allele frequencies from the UK10K cohorts and the Exome Aggregation Consortium (ExAC), among others, have been added. The original seven prediction scores in v2.0 (SIFT, 2× Polyphen2, LRT, MutationTaster, MutationAssessor, and FATHMM) as well as many SNV and gene functional annotations have been updated. dbNSFP v3.0 is freely available at http://sites.google.com/site/jpopgen/dbNSFP.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Scaling accurate genetic variant discovery to tens of thousands of samples

            Comprehensive disease gene discovery in both common and rare diseases will require the efficient and accurate detection of all classes of genetic variation across tens to hundreds of thousands of human samples. We describe here a novel assembly-based approach to variant calling, the GATK HaplotypeCaller (HC) and Reference Confidence Model (RCM), that determines genotype likelihoods independently per-sample but performs joint calling across all samples within a project simultaneously. We show by calling over 90,000 samples from the Exome Aggregation Consortium (ExAC) that, in contrast to other algorithms, the HC-RCM scales efficiently to very large sample sizes without loss in accuracy; and that the accuracy of indel variant calling is superior in comparison to other algorithms. More importantly, the HC-RCM produces a fully squared-off matrix of genotypes across all samples at every genomic position being investigated. The HC- RCM is a novel, scalable, assembly-based algorithm with abundant applications for population genetics and clinical studies.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The ExAC browser: displaying reference data information from over 60 000 exomes

              Worldwide, hundreds of thousands of humans have had their genomes or exomes sequenced, and access to the resulting data sets can provide valuable information for variant interpretation and understanding gene function. Here, we present a lightweight, flexible browser framework to display large population datasets of genetic variation. We demonstrate its use for exome sequence data from 60 706 individuals in the Exome Aggregation Consortium (ExAC). The ExAC browser provides gene- and transcript-centric displays of variation, a critical view for clinical applications. Additionally, we provide a variant display, which includes population frequency and functional annotation data as well as short read support for the called variant. This browser is open-source, freely available at http://exac.broadinstitute.org, and has already been used extensively by clinical laboratories worldwide.
                Bookmark

                Author and article information

                Contributors
                timo.lassmann@telethonkids.org.au
                Journal
                Sci Data
                Sci Data
                Scientific Data
                Nature Publishing Group UK (London )
                2052-4463
                29 April 2020
                29 April 2020
                2020
                : 7
                : 129
                Affiliations
                [1 ]Telethon Kids Institute, The University of Western Australia, Perth Children’s Hospital, Perth, Western Australia Australia
                [2 ]ISNI 0000 0001 2157 559X, GRID grid.1043.6, Menzies School of Health Research, , Charles Darwin University, ; Darwin, Northern Territory Australia
                [3 ]ISNI 0000 0004 1936 7910, GRID grid.1012.2, Centre for Aboriginal Medical and Dental Health, , The University of Western Australia, ; Crawley, Western Australia
                [4 ]ISNI 0000 0004 0486 528X, GRID grid.1007.6, School of Education, , The University of Wollongong, ; New South Wales, Australia
                [5 ]ISNI 0000 0001 2179 088X, GRID grid.1008.9, Victorian Infectious Disease Service, The Royal Melbourne Hospital, and Doherty Department, , The University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, ; Victoria, Australia
                [6 ]Group A Streptococcal Research Group, Murdoch Childrens Research Institute, Melbourne, Victoria, Australia and Centre for International Child Health, Department of Paediatrics, Royal Children’s Hospital, Melbourne, Victoria Australia
                [7 ]ISNI 0000 0000 9760 5620, GRID grid.1051.5, Baker Heart and Diabetes Institute, ; Melbourne, Victoria Australia
                [8 ]ISNI 0000000121885934, GRID grid.5335.0, Department of Public Health and Primary Care, , The University of Cambridge, ; Cambridge, UK
                Author information
                http://orcid.org/0000-0002-1368-8356
                http://orcid.org/0000-0002-0784-7277
                http://orcid.org/0000-0002-0138-2691
                Article
                463
                10.1038/s41597-020-0463-1
                7190730
                32350262
                e3580b94-c001-4057-b77b-33898a3a9f8a
                © The Author(s) 2020

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.

                History
                : 25 September 2019
                : 24 March 2020
                Funding
                Funded by: FundRef https://doi.org/10.13039/501100000925, Department of Health | National Health and Medical Research Council (NHMRC);
                Award ID: APP634301
                Award Recipient :
                Categories
                Data Descriptor
                Custom metadata
                © The Author(s) 2020

                genetics,genetics research
                genetics, genetics research

                Comments

                Comment on this article