6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Towards complete and error-free genome assemblies of all vertebrate species

      1 , 2 , 3 , 4 , 5 , 4 , 6 , 1 , 7 , 8 , 3 , 9 , 10 , 10 , 11 , 12 , 6 , 6 , 13 , 14 , 2 , 3 , 3 , 4 , 4 , 15 , 16 , 4 , 6 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 6 , 27 , 28 , 28 , 29 , 30 , 31 , 32 , 33 , 33 , 34 , 33 , 35 , 36 , 37 , 38 , 39 , 15 , 40 , 3 , 41 , 42 , 9 , 9 , 43 , 44 , 44 , 45 , 45 , 45 , 45 , 46 , 45 , 3 , 47 , 47 , 48 , 43 , 49 , 50 , 51 , 52 , 49 , 53 , 43 , 3 , 3 , 3 , 3 , 3 , 3 , 12 , 2 , 54 , 55 , 56 , 57 , 57 , 57 , 15 , 40 , 58 , 59 , 60 , 61 , 62 , 10 , 11 , 63 , 6 , 64 , 65 , 66 , 67 , 68 , 68 , 69 , 70 , 69 , 70 , 71 , 72 , 13 , 13 , 13 , 14 , 14 , 14 , 1 , 63 , 73 , 43 , 43 , 43 , 43 , 33 , 74 , 75 , 76 , 77 , 78 , 79 , 80 , 81 , 82 , 83 , 84 , 85 , 86 , 84 , 87 , 88 , 89 , 90 , 91 , 92 , 93 , 94 , 95 , 96 , 44 , 97 , 43 , 85 , 98 , 99 , 45 , 5 , 100 , 101 , , 3 , , 15 , 40 , 102 , , 2 , 3 , , 1 , , 4 , 6 , 86

      Nature

      Nature Publishing Group UK

      Genome assembly algorithms, Evolutionary genetics, Molecular evolution, Research data

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species 14 . To address this issue, the international Genome 10K (G10K) consortium 5, 6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.

          Abstract

          The Vertebrate Genome Project has used an optimized pipeline to generate high-quality genome assemblies for sixteen species (representing all major vertebrate classes), which have led to new biological insights.

          Related collections

          Most cited references 90

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The Sequence Alignment/Map format and SAMtools

          Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Fast and accurate short read alignment with Burrows–Wheeler transform

            Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ∼10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: rd@sanger.ac.uk
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              BLAST+: architecture and applications

              Background Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. Results We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. Conclusion The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.
                Bookmark

                Author and article information

                Contributors
                kj2@sanger.ac.uk
                gene@mpi-cbg.de
                rd109@cam.ac.uk
                adam.phillippy@nih.gov
                ejarvis@rockefeller.edu
                Journal
                Nature
                Nature
                Nature
                Nature Publishing Group UK (London )
                0028-0836
                1476-4687
                28 April 2021
                28 April 2021
                2021
                : 592
                : 7856
                : 737-746
                Affiliations
                [1 ]GRID grid.280128.1, ISNI 0000 0001 2233 9230, Genome Informatics Section, Computational and Statistical Genomics Branch, , National Human Genome Research Institute, National Institutes of Health, ; Bethesda, MD USA
                [2 ]GRID grid.5335.0, ISNI 0000000121885934, Department of Genetics, , University of Cambridge, ; Cambridge, UK
                [3 ]GRID grid.10306.34, ISNI 0000 0004 0606 5382, Wellcome Sanger Institute, ; Cambridge, UK
                [4 ]GRID grid.134907.8, ISNI 0000 0001 2166 1519, Vertebrate Genome Lab, , The Rockefeller University, ; New York, NY USA
                [5 ]GRID grid.27860.3b, ISNI 0000 0004 1936 9684, The Genome Center, , University of California Davis, ; Davis, CA USA
                [6 ]GRID grid.134907.8, ISNI 0000 0001 2166 1519, Laboratory of Neurogenetics of Language, , The Rockefeller University, ; New York, NY USA
                [7 ]GRID grid.418779.4, ISNI 0000 0001 0708 0355, Leibniz Institute for Zoo and Wildlife Research, , Department of Evolutionary Genetics, ; Berlin, Germany
                [8 ]Berlin Center for Genomics in Biodiversity Research, Berlin, Germany
                [9 ]DNAnexus Inc., Mountain View, CA USA
                [10 ]GRID grid.31501.36, ISNI 0000 0004 0470 5905, Interdisciplinary Program in Bioinformatics, , Seoul National University, ; Seoul, Republic of Korea
                [11 ]GRID grid.31501.36, ISNI 0000 0004 0470 5905, Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, , Seoul National University, ; Seoul, Republic of Korea
                [12 ]GRID grid.42505.36, ISNI 0000 0001 2156 6853, University of Southern California, ; Los Angeles, CA USA
                [13 ]GRID grid.280285.5, ISNI 0000 0004 0507 7840, National Center for Biotechnology Information, , National Library of Medicine, NIH, ; Bethesda, MD USA
                [14 ]GRID grid.225360.0, ISNI 0000 0000 9709 7726, European Molecular Biology Laboratory, European Bioinformatics Institute, , Wellcome Genome Campus, ; Hinxton, UK
                [15 ]GRID grid.419537.d, ISNI 0000 0001 2113 4567, Max Planck Institute of Molecular Cell Biology and Genetics, ; Dresden, Germany
                [16 ]GRID grid.4488.0, ISNI 0000 0001 2111 7257, DRESDEN-concept Genome Center, ; Dresden, Germany
                [17 ]Novogene, Durham, NC USA
                [18 ]GRID grid.419550.c, ISNI 0000 0004 0501 3839, Neurogenetics of Vocal Communication Group, , Max Planck Institute for Psycholinguistics, ; Nijmegen, The Netherlands
                [19 ]GRID grid.5590.9, ISNI 0000000122931605, Donders Institute for Brain, , Cognition and Behaviour, ; Nijmegen, The Netherlands
                [20 ]GRID grid.11914.3c, ISNI 0000 0001 0721 1626, School of Biology, , University of St Andrews, ; St Andrews, UK
                [21 ]GRID grid.266683.f, ISNI 0000 0001 2184 9220, University of Massachusetts Cooperative Fish and Wildlife Research Unit, ; Amherst, MA USA
                [22 ]GRID grid.1010.0, ISNI 0000 0004 1936 7304, School of Biological Science, The Environment Institute, , University of Adelaide, ; Adelaide, South Australia Australia
                [23 ]GRID grid.134936.a, ISNI 0000 0001 2162 3504, Bond Life Sciences Center, , University of Missouri, ; Columbia, MO USA
                [24 ]GRID grid.255364.3, ISNI 0000 0001 2191 0423, Department of Biology, , East Carolina University, ; Greenville, NC USA
                [25 ]GRID grid.1003.2, ISNI 0000 0000 9320 7537, UQ Genomics, , University of Queensland, ; Brisbane, Queensland Australia
                [26 ]GRID grid.26090.3d, ISNI 0000 0001 0665 0280, Department of Biological Sciences, , Clemson University, ; Clemson, SC USA
                [27 ]The Genetic Rescue Foundation, Wellington, New Zealand
                [28 ]Kākāpō Recovery, Department of Conservation, Invercargill, New Zealand
                [29 ]GRID grid.29980.3a, ISNI 0000 0004 1936 7830, Department of Zoology, , University of Otago, ; Dunedin, New Zealand
                [30 ]GRID grid.134563.6, ISNI 0000 0001 2168 186X, University of Arizona Genetics Core, ; Tucson, AZ USA
                [31 ]GRID grid.35937.3b, ISNI 0000 0001 2270 9879, Department of Life Sciences, , Natural History Museum, ; London, UK
                [32 ]GRID grid.7362.0, ISNI 0000000118820937, School of Natural Sciences, , Bangor University, ; Gwynedd, UK
                [33 ]GRID grid.9811.1, ISNI 0000 0001 0658 7699, Department of Biology, , University of Konstanz, ; Konstanz, Germany
                [34 ]GRID grid.38142.3c, ISNI 000000041936754X, Department of Organismic and Evolutionary Biology, , Harvard University, ; Cambridge, MA USA
                [35 ]GRID grid.261112.7, ISNI 0000 0001 2173 3359, Department of Marine and Environmental Sciences, , Northeastern University Marine Science Center, ; Nahant, MA USA
                [36 ]GRID grid.5284.b, ISNI 0000 0001 0790 3681, Department of Biology, , University of Antwerp, ; Antwerp, Belgium
                [37 ]GRID grid.425948.6, ISNI 0000 0001 2159 802X, Naturalis Biodiversity Center, ; Leiden, The Netherlands
                [38 ]GRID grid.5110.5, ISNI 0000000121539003, Institute of Biology, , Karl-Franzens University of Graz, ; Graz, Austria
                [39 ]GRID grid.15276.37, ISNI 0000 0004 1936 8091, Florida Museum of Natural History, , University of Florida, ; Gainesville, FL USA
                [40 ]GRID grid.495510.c, Center for Systems Biology, ; Dresden, Germany
                [41 ]GRID grid.6612.3, ISNI 0000 0004 1937 0642, Zoological Institute, , University of Basel, ; Basel, Switzerland
                [42 ]Tag.bio, San Francisco, CA USA
                [43 ]GRID grid.205975.c, ISNI 0000 0001 0740 6917, UC Santa Cruz Genomics Institute, , University of California, ; Santa Cruz, CA USA
                [44 ]GRID grid.422956.e, ISNI 0000 0001 2225 0471, San Diego Zoo Global, ; Escondido, CA USA
                [45 ]GRID grid.423340.2, ISNI 0000 0004 0640 9878, Pacific Biosciences, ; Menlo Park, CA USA
                [46 ]Digital BioLogic, Ivanić-Grad, Croatia
                [47 ]GRID grid.470262.5, ISNI 0000 0004 0473 1353, Bionano Genomics, ; San Diego, CA USA
                [48 ]GRID grid.504177.0, Arima Genomics, ; San Diego, CA USA
                [49 ]GRID grid.504403.6, Dovetail Genomics, ; Santa Cruz, CA USA
                [50 ]Independent Researcher, Santa Cruz, CA USA
                [51 ]GRID grid.473715.3, ISNI 0000 0004 6475 7299, CNAG-CRG, Centre for Genomic Regulation, , Barcelona Institute of Science and Technology, ; Barcelona, Spain
                [52 ]GRID grid.5612.0, ISNI 0000 0001 2172 2676, Universitat Pompeu Fabra, ; Barcelona, Spain
                [53 ]GRID grid.164295.d, ISNI 0000 0001 0941 7177, Department of Computer Science, , University of Maryland College Park, ; College Park, MD USA
                [54 ]GRID grid.19373.3f, ISNI 0000 0001 0193 3564, School of Computer Science and Technology, Center for Bioinformatics, , Harbin Institute of Technology, ; Harbin, China
                [55 ]GRID grid.170205.1, ISNI 0000 0004 1936 7822, Department of Psychology, Institute for Mind and Biology, , University of Chicago, ; Chicago, IL USA
                [56 ]GRID grid.26090.3d, ISNI 0000 0001 0665 0280, Department of Genetics and Biochemistry, , Clemson University, ; Clemson, SC USA
                [57 ]GRID grid.5288.7, ISNI 0000 0000 9758 5690, Department of Behavioral Neuroscience, , Oregon Health and Science University, ; Portland, OR USA
                [58 ]GRID grid.419560.f, ISNI 0000 0001 2154 3117, Max Planck Institute for the Physics of Complex Systems, ; Dresden, Germany
                [59 ]GRID grid.440425.3, Monash University Malaysia Genomics Facility, School of Science, ; Selangor Darul Ehsan, Malaysia
                [60 ]GRID grid.440425.3, Tropical Medicine and Biology Multidisciplinary Platform, , Monash University Malaysia, ; Selangor Darul Ehsan, Malaysia
                [61 ]Qatar Falcon Genome Project, Doha, Qatar
                [62 ]GRID grid.4708.b, ISNI 0000 0004 1757 2822, Department of Biosciences, , University of Milan, ; Milan, Italy
                [63 ]eGnome, Inc., Seoul, Republic of Korea
                [64 ]LOEWE Centre for Translational Biodiversity Genomics, Frankfurt, Germany
                [65 ]GRID grid.438154.f, ISNI 0000 0001 0944 0975, Senckenberg Research Institute, ; Frankfurt, Germany
                [66 ]GRID grid.7839.5, ISNI 0000 0004 1936 9721, Goethe-University, Faculty of Biosciences, ; Frankfurt, Germany
                [67 ]GRID grid.21155.32, ISNI 0000 0001 2034 1839, BGI-Shenzhen, ; Shenzhen, China
                [68 ]GRID grid.29857.31, ISNI 0000 0001 2097 4281, Department of Biology, , Pennsylvania State University, ; University Park, PA USA
                [69 ]GRID grid.29857.31, ISNI 0000 0001 2097 4281, Center for Medical Genomics, , Pennsylvania State University, ; University Park, PA USA
                [70 ]GRID grid.29857.31, ISNI 0000 0001 2097 4281, Center for Computational Biology and Bioinformatics, , Pennsylvania State University, ; University Park, PA USA
                [71 ]GRID grid.29857.31, ISNI 0000 0001 2097 4281, Department of Computer Science and Engineering, , Pennsylvania State University, ; University Park, PA USA
                [72 ]GRID grid.29857.31, ISNI 0000 0001 2097 4281, Department of Biochemistry and Molecular Biology, , Pennsylvania State University, ; University Park, PA USA
                [73 ]Hoonygen, Seoul, Korea
                [74 ]GRID grid.507516.0, ISNI 0000 0004 7661 536X, Department of Migration, , Max Planck Institute of Animal Behavior, ; Radolfzell, Germany
                [75 ]GRID grid.7247.6, ISNI 0000000419370714, Department of Biological Sciences, , Universidad de los Andes, ; Bogotá, Colombia
                [76 ]GRID grid.5254.6, ISNI 0000 0001 0674 042X, Center for Evolutionary Hologenomics, The GLOBE Institute, , University of Copenhagen, ; Copenhagen, Denmark
                [77 ]GRID grid.5947.f, ISNI 0000 0001 1516 2393, University Museum, NTNU, ; Trondheim, Norway
                [78 ]GRID grid.21155.32, ISNI 0000 0001 2034 1839, China National Genebank, BGI-Shenzhen, ; Shenzhen, China
                [79 ]GRID grid.5254.6, ISNI 0000 0001 0674 042X, Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, , University of Copenhagen, ; Copenhagen, Denmark
                [80 ]GRID grid.419010.d, ISNI 0000 0004 1792 7072, State Key Laboratory of Genetic Resources and Evolution, , Kunming Institute of Zoology, Chinese Academy of Sciences, ; Kunming, China
                [81 ]GRID grid.9227.e, ISNI 0000000119573309, Center for Excellence in Animal Evolution and Genetics, , Chinese Academy of Sciences, ; Kunming, China
                [82 ]GRID grid.418812.6, ISNI 0000 0004 0620 9243, Institute of Molecular and Cell Biology, A*STAR, Biopolis, ; Singapore, Singapore
                [83 ]GRID grid.421647.2, ISNI 0000 0001 2197 9375, Centre for Biodiversity, Royal Ontario Museum, ; Toronto, Ontario Canada
                [84 ]GRID grid.467700.2, ISNI 0000 0001 2182 2028, Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, ; Washington, DC USA
                [85 ]GRID grid.205975.c, ISNI 0000 0001 0740 6917, Department of Ecology and Evolutionary Biology, , University of California Santa Cruz, ; Santa Cruz, CA USA
                [86 ]GRID grid.413575.1, ISNI 0000 0001 2167 1581, Howard Hughes Medical Institute, ; Chevy Chase, MD USA
                [87 ]GRID grid.1214.6, ISNI 0000 0000 8716 3312, The Walter Reed Biosystematics Unit, Museum Support Center MRC-534, , Smithsonian Institution, ; Suitland, MD USA
                [88 ]GRID grid.507680.c, ISNI 0000 0001 2230 3166, Walter Reed Army Institute of Research, ; Silver Spring, MD USA
                [89 ]GRID grid.8273.e, ISNI 0000 0001 1092 7967, Department of Biological Sciences, Earlham Institute, , University of East Anglia, ; Norwich, UK
                [90 ]Institute of Evolutionary Biology (UPF-CSIC), PRBB, Barcelona, Spain
                [91 ]GRID grid.425902.8, ISNI 0000 0000 9601 989X, Catalan Institution of Research and Advanced Studies (ICREA), ; Barcelona, Spain
                [92 ]GRID grid.473715.3, ISNI 0000 0004 6475 7299, Centre for Genomic Regulation (CRG), , Barcelona Institute of Science and Technology (BIST), ; Barcelona, Spain
                [93 ]GRID grid.7080.f, Institut Català de Paleontologia Miquel Crusafont, , Universitat Autònoma de Barcelona, ; Barcelona, Spain
                [94 ]GRID grid.7886.1, ISNI 0000 0001 0768 2743, School of Biology and Environmental Science, , University College Dublin, ; Dublin, Ireland
                [95 ]GRID grid.35403.31, ISNI 0000 0004 1936 9991, Department of Computer Science, , The University of Illinois at Urbana-Champaign, ; Urbana, IL USA
                [96 ]GRID grid.1018.8, ISNI 0000 0001 2342 0938, School of Life Science, , La Trobe University, ; Melbourne, Victoria Australia
                [97 ]GRID grid.266100.3, ISNI 0000 0001 2107 4242, Department of Evolution, Behavior, and Ecology, , University of California San Diego, ; La Jolla, CA USA
                [98 ]GRID grid.35915.3b, ISNI 0000 0001 0413 4629, Laboratory of Genomics Diversity-Center for Computer Technologies, , ITMO University, ; St. Petersburg, Russian Federation
                [99 ]GRID grid.261241.2, ISNI 0000 0001 2168 8324, Guy Harvey Oceanographic Center, Halmos College of Natural Sciences and Oceanography, , Nova Southeastern University, ; Fort Lauderdale, FL USA
                [100 ]GRID grid.27860.3b, ISNI 0000 0004 1936 9684, Department of Evolution and Ecology, , University of California Davis, ; Davis, CA USA
                [101 ]GRID grid.27860.3b, ISNI 0000 0004 1936 9684, John Muir Institute for the Environment, , University of California Davis, ; Davis, CA USA
                [102 ]GRID grid.4488.0, ISNI 0000 0001 2111 7257, Faculty of Computer Science, , Technical University Dresden, ; Dresden, Germany
                Article
                3451
                10.1038/s41586-021-03451-0
                8081667
                33911273
                © The Author(s) 2021

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                Categories
                Article
                Custom metadata
                © The Author(s), under exclusive licence to Springer Nature Limited 2021

                Comments

                Comment on this article