Genome size evolution: towards new model systems for old questions

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Genome size (GS) variation is a fundamental biological characteristic; however, its evolutionary causes and consequences are the topic of ongoing debate. Whether GS is a neutral trait or one subject to selective pressures, and how strong these selective pressures are, may remain open questions. Fundamentally, the genomic sequences responsible for this variation directly impact the potential evolutionary outcomes and, equally, are the targets of different evolutionary pressures. For example, duplications and deletions of genic regions (large or small) can have immediate and drastic phenotypic effects, while an expansion or contraction of non-coding DNA is less likely to cause catastrophic phenotypic effects. However, in the long term, the accumulation or deletion of ncDNA is likely to have larger effects. Modern sequencing technologies are allowing for the dissection of these proximate causes, but a combination of these new technologies with more traditional evolutionary experiments and approaches could revolutionize this debate and potentially resolve many of these arguments. Here, I discuss an ambitious way forward for GS research, putting it in context of historical debates, theories and sometimes contradictory evidence, and highlighting the promise of combining new sequencing technologies and analytical developments with more traditional experimental evolution approaches.

Related collections

Most cited references 95

Record: found
Abstract: found
Article: not found

RepeatModeler2 for automated genomic discovery of transposable element families

Jullien Flynn, Robert Hubley, Clément Goubert … (2020)

The accelerating pace of genome sequencing throughout the tree of life is driving the need for improved unsupervised annotation of genome components such as transposable elements (TEs). Because the types and sequences of TEs are highly variable across species, automated TE discovery and annotation are challenging and time-consuming tasks. A critical first step is the de novo identification and accurate compilation of sequence models representing all of the unique TE families dispersed in the genome. Here we introduce RepeatModeler2, a pipeline that greatly facilitates this process. This program brings substantial improvements over the original version of RepeatModeler, one of the most widely used tools for TE discovery. In particular, this version incorporates a module for structural discovery of complete long terminal repeat (LTR) retroelements, which are widespread in eukaryotic genomes but recalcitrant to automated identification because of their size and sequence complexity. We benchmarked RepeatModeler2 on three model species with diverse TE landscapes and high-quality, manually curated TE libraries: Drosophila melanogaster (fruit fly), Danio rerio (zebrafish), and Oryza sativa (rice). In these three species, RepeatModeler2 identified approximately 3 times more consensus sequences matching with >95% sequence identity and sequence coverage to the manually curated sequences than the original RepeatModeler. As expected, the greatest improvement is for LTR retroelements. Thus, RepeatModeler2 represents a valuable addition to the genome annotation toolkit that will enhance the identification and study of TEs in eukaryotic genome sequences. RepeatModeler2 is available as source code or a containerized package under an open license ( https://github.com/Dfam-consortium/RepeatModeler , http://www.repeatmasker.org/RepeatModeler/ ).

0 comments Cited 861 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The origins of genome complexity.

Michael Lynch, John Conery (2003)

Complete genomic sequences from diverse phylogenetic lineages reveal notable increases in genome complexity from prokaryotes to multicellular eukaryotes. The changes include gradual increases in gene number, resulting from the retention of duplicate genes, and more abrupt increases in the abundance of spliceosomal introns and mobile genetic elements. We argue that many of these modifications emerged passively in response to the long-term population-size reductions that accompanied increases in organism size. According to this model, much of the restructuring of eukaryotic genomes was initiated by nonadaptive processes, and this in turn provided novel substrates for the secondary evolution of phenotypic complexity by natural selection. The enormous long-term effective population sizes of prokaryotes may impose a substantial barrier to the evolution of complex genomes and morphologies.

0 comments Cited 590 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads.

Petr Novak, Pavel Neumann, Jiří Pech … (2013)

Repetitive DNA makes up large portions of plant and animal nuclear genomes, yet it remains the least-characterized genome component in most species studied so far. Although the recent availability of high-throughput sequencing data provides necessary resources for in-depth investigation of genomic repeats, its utility is hampered by the lack of specialized bioinformatics tools and appropriate computational resources that would enable large-scale repeat analysis to be run by biologically oriented researchers. Here we present RepeatExplorer, a collection of software tools for characterization of repetitive elements, which is accessible via web interface. A key component of the server is the computational pipeline using a graph-based sequence clustering algorithm to facilitate de novo repeat identification without the need for reference databases of known elements. Because the algorithm uses short sequences randomly sampled from the genome as input, it is ideal for analyzing next-generation sequence reads. Additional tools are provided to aid in classification of identified repeats, investigate phylogenetic relationships of retroelements and perform comparative analysis of repeat composition between multiple species. The server allows to analyze several million sequence reads, which typically results in identification of most high and medium copy repeats in higher plant genomes.