13
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Assembly of a pan-genome from deep sequencing of 910 humans of African descent

      research-article
      1 , 2 , * , 1 , 3 , 1 , 1 , 4 , 4 , 4 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 4 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 16 , 19 , 20 , 21 , 14 , 5 , 22 , 23 , 24 , 25 , 11 , 26 , 19 , 27 , 28 , 4 , 5 , 29 , 30 , 31 , 5 , 30 , 4 ,   1 , 2 , 29 , 31 , *
      Nature genetics

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We used a deeply sequenced dataset of 910 individuals, all of African descent, to construct a set of DNA sequences present in these individuals but missing from the reference human genome. We aligned 1.19 trillion reads from the 910 individuals to the reference genome (GRCh38), collected all reads that failed to align, and assembled these reads into contiguous sequences (contigs). We then compared all contigs to one another to identify a set of unique sequences representing regions of the African pan-genome missing from the reference genome. Our analysis revealed 296,485,284 bp in 125,715 distinct contigs present in the African-descended populations, demonstrating that the African pan-genome contains ~10% more DNA than the current human reference genome. Although the functional significance of nearly all of this sequence is unknown, 387 of the novel contigs fall within 315 distinct protein-coding genes while the rest appear to be intergenic.

          Editorial Summary:

          Assembly of a pan-genome from 910 humans of African descent identifies 296.5 Mb of novel DNA mapping to 125,715 distinct contigs. This African pan-genome contains ~10% more DNA than the current human reference genome.

          Related collections

          Most cited references16

          • Record: found
          • Abstract: found
          • Article: not found

          Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome".

          The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and also limits genome-wide screens for vaccine candidates or for antimicrobial targets. We have generated the genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans. Analysis of these genomes and those available in databases showed that the S. agalactiae species can be described by a pan-genome consisting of a core genome shared by all isolates, accounting for approximately 80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactiae pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Extensive sequencing of seven human genomes to characterize benchmark reference materials

            The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Using MUMmer to identify similar regions in large sequence sets.

              The MUMmer sequence alignment package is a suite of computer programs designed to detect regions of homology in long biological sequences. Version 2.1 makes several improvements to the package, including: increased speed and reduced memory requirements; the ability to handle both protein and DNA sequences; the ability to handle multiple sequence fragments; and new algorithms for clustering together basic matches. The system is particularly efficient at comparing highly similar sequences, such as alternative versions of fragment assemblies or closely related strains of the same bacterium.
                Bookmark

                Author and article information

                Journal
                9216904
                2419
                Nat Genet
                Nat. Genet.
                Nature genetics
                1061-4036
                1546-1718
                16 October 2018
                19 November 2018
                January 2019
                19 May 2019
                : 51
                : 1
                : 30-35
                Affiliations
                [1 ]Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD 21205 USA
                [2 ]Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218 USA
                [3 ]Departments of Computer Science, Biology, and Mathematics, Harvey Mudd College, Claremont, CA 91711 USA
                [4 ]Department of Medicine, University of Colorado Denver, Aurora, CO 80045
                [5 ]Department of Medicine, Johns Hopkins University, Baltimore, MD 21224 USA
                [6 ]Department of Internal Medicine, Section on Pulmonary, Critical Care, Allergy and Immunologic Diseases, Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, NC 27157 USA
                [7 ]Department of Public Health Sciences, Henry Ford Health System, Detroit, MI 48202 USA
                [8 ]Department of Medicine, University of h, San Francisco, San Francisco, CA 94143 USA
                [9 ]Department of Parasitology, Leiden University Medical Center, Leiden, Netherlands
                [10 ]Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS 39216 USA
                [11 ]Institute for Immunological Research, Universidad de Cartagena, Cartagena 130000 Colombia
                [12 ]Department of Internal Medicine, Henry Ford Health System, Detroit, MI 48202 USA
                [13 ]Faculty of Medical Sciences Cave Hill Campus, The University of the West Indies, Bridgetown BB11000 Barbados
                [14 ]Department of Medicine, Vanderbilt University, Nashville, TN 37232 USA
                [15 ]Department of Medicine and Center for Global Health, University of Chicago, Chicago, IL 60637 USA
                [16 ]Department of Medicine, University of Chicago, Chicago, IL 60637 USA
                [17 ]Laboratório de Patologia Experimental, Centro de Pesquisas Gonçalo Moniz, Salvador, BA 40296-710 Brazil
                [18 ]Department of Human Genetics, University of Chicago, Chicago, IL 60637 USA
                [19 ]Department of Medicine, University of Arizona College of Medicine, Tucson, AZ 85724 USA
                [20 ]Centro de Neumologia y Alergias, San Pedro Sula 21102 Honduras
                [21 ]Caribbean Institute for Health Research, The University of the West Indies, Kingston 7, Jamaica
                [22 ]Pulmonary and Critical Care Medicine, Morehouse School of Medicine, Atlanta, GA 30310 USA
                [23 ]Department of Medicine, Einstein Medical Center, Philadelphia, PA 19141 USA
                [24 ]National Human Genome Center, Howard University College of Medicine, Washington, DC 20059 USA
                [25 ]Department of Microbiology, Howard University College of Medicine, Washington, DC 20059 USA
                [26 ]Department of Bioengineering & Therapeutic Sciences and Medicine, University of California, San Francisco, San Francisco, CA 94158 USA
                [27 ]Immunology Service, Universidade Federal da Bahia, Salvador BA 401110170 Brazil
                [28 ]Facultad de Ciencias Médicas, Universidad Tecnológica Centroamericana (UNITEC), Tegucigalpa, Honduras 11101
                [29 ]Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205 USA
                [30 ]Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205 USA
                [31 ]Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
                Author notes

                Author Contributions. RMS designed and performed analyses and wrote the paper. JF performed analyses. VA pre-processed data. DP performed analyses. MD collected data and provided comments on the manuscript. NR, MPB, SC, CV, VEO, AML, CE, MY, JGW, JM, LAL, LKW, HW, LBW, COO, OO, RRO, CO, DLN, DAM, AM, JK, TH, NNH, MGF, JGF, MUF, GMD, LC, EGB, ERB, MIA, EFH, MC, and CF collected data. MAT, THB, and IR collected data and provided comments on the manuscript. RAM collected data. KCB collected data and provided comments on the manuscript. SLS conceived and advised the project and wrote the paper.

                [* ]Correspondence should be addressed to rsherman@ 123456jhu.edu or salzberg@ 123456jhu.edu .
                Article
                NIHMS1509230
                10.1038/s41588-018-0273-y
                6309586
                30455414
                7fc2debe-bae4-4eaf-b930-465a090cb02e

                Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms

                History
                Categories
                Article

                Genetics
                Genetics

                Comments

                Comment on this article