19
views
0
recommends
+1 Recommend
2 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Long Reads Are Revolutionizing 20 Years of Insect Genome Sequencing

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The first insect genome assembly ( Drosophila melanogaster) was published two decades ago. Today, nuclear genome assemblies are available for a staggering 601 insect species representing 20 orders. In this study, we analyzed the most-contiguous assembly for each species and provide a “state-of-the-field” perspective, emphasizing taxonomic representation, assembly quality, gene completeness, and sequencing technologies. Relative to species richness, genomic efforts have been biased toward four orders (Diptera, Hymenoptera, Collembola, and Phasmatodea), Coleoptera are underrepresented, and 11 orders still lack a publicly available genome assembly. The average insect genome assembly is 439.2 Mb in length with 87.5% of single-copy benchmarking genes intact. Most notable has been the impact of long-read sequencing; assemblies that incorporate long reads are ∼48× more contiguous than those that do not. We offer four recommendations as we collectively continue building insect genome resources: 1) seek better integration between independent research groups and consortia, 2) balance future sampling between filling taxonomic gaps and generating data for targeted questions, 3) take advantage of long-read sequencing technologies, and 4) expand and improve gene annotations.

          Related collections

          Most cited references26

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Towards complete and error-free genome assemblies of all vertebrate species

          High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species 1 – 4 . To address this issue, the international Genome 10K (G10K) consortium 5 , 6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences. The Vertebrate Genome Project has used an optimized pipeline to generate high-quality genome assemblies for sixteen species (representing all major vertebrate classes), which have led to new biological insights.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A new view of the tree of life.

            The tree of life is one of the most important organizing principles in biology(1). Gene surveys suggest the existence of an enormous number of branches(2), but even an approximation of the full scale of the tree has remained elusive. Recent depictions of the tree of life have focused either on the nature of deep evolutionary relationships(3-5) or on the known, well-classified diversity of life with an emphasis on eukaryotes(6). These approaches overlook the dramatic change in our understanding of life's diversity resulting from genomic sampling of previously unexamined environments. New methods to generate genome sequences illuminate the identity of organisms and their metabolic capacities, placing them in community and ecosystem contexts(7,8). Here, we use new genomic data from over 1,000 uncultivated and little known organisms, together with published sequences, to infer a dramatically expanded version of the tree of life, with Bacteria, Archaea and Eukarya included. The depiction is both a global overview and a snapshot of the diversity within each major lineage. The results reveal the dominance of bacterial diversification and underline the importance of organisms lacking isolated representatives, with substantial evolution concentrated in a major radiation of such organisms. This tree highlights major lineages currently underrepresented in biogeochemical models and identifies radiations that are probably important for future evolutionary analyses.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Opportunities and challenges in long-read sequencing data analysis

              Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Genome Biol Evol
                Genome Biol Evol
                gbe
                Genome Biology and Evolution
                Oxford University Press
                1759-6653
                August 2021
                21 June 2021
                21 June 2021
                : 13
                : 8
                : evab138
                Affiliations
                [1 ]School of Biological Sciences, Washington State University , Pullman, Washington, USA
                [2 ]Department of Biology, University of Rochester , New York, USA
                [3 ]LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG) , Frankfurt, Germany
                [4 ]Department of Terrestrial Zoology, Entomology III, Senckenberg Research Institute and Natural History Museum Frankfurt , Frankfurt, Germany
                [5 ]Department of Plant and Wildlife Sciences, Brigham Young University , Provo, Utah, USA
                [6 ]Institute for Insect Biotechnology, Justus-Liebig-University , Giessen, Germany
                [7 ]Data Science Lab, Smithsonian Institution , Washington, District of Columbia, USA
                Author notes
                Author information
                https://orcid.org/0000-0002-5965-0986
                https://orcid.org/0000-0002-7731-605X
                https://orcid.org/0000-0002-4801-7579
                Article
                evab138
                10.1093/gbe/evab138
                8358217
                34152413
                3cea2058-fd1a-4656-9a25-6cc152c75bf7
                © The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 10 June 2021
                Page count
                Pages: 7
                Categories
                Letter
                AcademicSubjects/SCI01130
                AcademicSubjects/SCI01140

                Genetics
                insecta,arthropoda,arthropod genomics,long-read sequencing,pacific biosciences,oxford nanopore

                Comments

                Comment on this article