Long Reads Are Revolutionizing 20 Years of Insect Genome Sequencing

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The first insect genome assembly ( Drosophila melanogaster) was published two decades ago. Today, nuclear genome assemblies are available for a staggering 601 insect species representing 20 orders. In this study, we analyzed the most-contiguous assembly for each species and provide a “state-of-the-field” perspective, emphasizing taxonomic representation, assembly quality, gene completeness, and sequencing technologies. Relative to species richness, genomic efforts have been biased toward four orders (Diptera, Hymenoptera, Collembola, and Phasmatodea), Coleoptera are underrepresented, and 11 orders still lack a publicly available genome assembly. The average insect genome assembly is 439.2 Mb in length with 87.5% of single-copy benchmarking genes intact. Most notable has been the impact of long-read sequencing; assemblies that incorporate long reads are ∼48× more contiguous than those that do not. We offer four recommendations as we collectively continue building insect genome resources: 1) seek better integration between independent research groups and consortia, 2) balance future sampling between filling taxonomic gaps and generating data for targeted questions, 3) take advantage of long-read sequencing technologies, and 4) expand and improve gene annotations.

Related collections

Most cited references 26

Record: found
Abstract: found
Article: found

Is Open Access

Towards complete and error-free genome assemblies of all vertebrate species

Arang Rhie, Shane A. McCarthy, Olivier Fedrigo … (2021)

High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species 1 – 4 . To address this issue, the international Genome 10K (G10K) consortium 5 , 6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences. The Vertebrate Genome Project has used an optimized pipeline to generate high-quality genome assemblies for sixteen species (representing all major vertebrate classes), which have led to new biological insights.

0 comments Cited 541 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A new view of the tree of life.

Laura A. Hug, Brett J. Baker, Karthik Anantharaman … (2016)

The tree of life is one of the most important organizing principles in biology(1). Gene surveys suggest the existence of an enormous number of branches(2), but even an approximation of the full scale of the tree has remained elusive. Recent depictions of the tree of life have focused either on the nature of deep evolutionary relationships(3-5) or on the known, well-classified diversity of life with an emphasis on eukaryotes(6). These approaches overlook the dramatic change in our understanding of life's diversity resulting from genomic sampling of previously unexamined environments. New methods to generate genome sequences illuminate the identity of organisms and their metabolic capacities, placing them in community and ecosystem contexts(7,8). Here, we use new genomic data from over 1,000 uncultivated and little known organisms, together with published sequences, to infer a dramatically expanded version of the tree of life, with Bacteria, Archaea and Eukarya included. The depiction is both a global overview and a snapshot of the diversity within each major lineage. The results reveal the dominance of bacterial diversification and underline the importance of organisms lacking isolated representatives, with substantial evolution concentrated in a major radiation of such organisms. This tree highlights major lineages currently underrepresented in biogeochemical models and identifies radiations that are probably important for future evolutionary analyses.

0 comments Cited 466 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Opportunities and challenges in long-read sequencing data analysis

Shanika Amarasinghe, Shian Su, Xueyi Dong … (2020)

Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.

0 comments Cited 464 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Federico Hoffmann: Role: Associate Editor

Journal

Journal ID (nlm-ta): Genome Biol Evol

Journal ID (iso-abbrev): Genome Biol Evol

Journal ID (publisher-id): gbe

Title: Genome Biology and Evolution

Publisher: Oxford University Press

ISSN (Electronic): 1759-6653

Publication date Collection: August 2021

Publication date (Electronic): 21 June 2021

Publication date PMC-release: 21 June 2021

Volume: 13

Issue: 8

Electronic Location Identifier: evab138

Affiliations

[1 ]School of Biological Sciences, Washington State University , Pullman, Washington, USA

[2 ]Department of Biology, University of Rochester , New York, USA

[3 ]LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG) , Frankfurt, Germany

[4 ]Department of Terrestrial Zoology, Entomology III, Senckenberg Research Institute and Natural History Museum Frankfurt , Frankfurt, Germany

[5 ]Department of Plant and Wildlife Sciences, Brigham Young University , Provo, Utah, USA

[6 ]Institute for Insect Biotechnology, Justus-Liebig-University , Giessen, Germany

[7 ]Data Science Lab, Smithsonian Institution , Washington, District of Columbia, USA

Author notes

Corresponding authors: E-mails: scott.hotaling@ 123456wsu.edu ; paul_frandsen@ 123456byu.edu .

Author information

Scott Hotaling https://orcid.org/0000-0002-5965-0986

Joanna L Kelley https://orcid.org/0000-0002-7731-605X

Paul B Frandsen https://orcid.org/0000-0002-4801-7579

Article

Publisher ID: evab138

DOI: 10.1093/gbe/evab138

PMC ID: 8358217

PubMed ID: 34152413

SO-VID: 3cea2058-fd1a-4656-9a25-6cc152c75bf7

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date accepted : 10 June 2021

Page count

Pages: 7

Comments

Comment on this article

scite_

Cited by 36

See all cited by

Most referenced authors 1,772

See all reference authors

- Version 1

Long Reads Are Revolutionizing 20 Years of Insect Genome Sequencing

Read this article at

Abstract

Related collections

Coleoptera

Most cited references 26

Towards complete and error-free genome assemblies of all vertebrate species

A new view of the tree of life.

Opportunities and challenges in long-read sequencing data analysis

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 159

Cited by 36

Most referenced authors 1,772