Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events.

Results

We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for ~25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation with modest read-depth coverage of the reference genome (>40-fold).

Conclusions

Using breseq to predict structural variation should be useful for studies of microbial epidemiology, experimental evolution, synthetic biology, and genetics when a reference genome for a closely related strain is available. In these cases, breseq can discover mutations that may be responsible for important or unintended changes in genomes that might otherwise go undetected.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1039) contains supplementary material, which is available to authorized users.

Related collections

Most cited references 21

Record: found
Abstract: found
Article: not found

Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseq.

Daniel Deatherage, Jeffrey E. Barrick (2014)

Next-generation DNA sequencing (NGS) can be used to reconstruct eco-evolutionary population dynamics and to identify the genetic basis of adaptation in laboratory evolution experiments. Here, we describe how to run the open-source breseq computational pipeline to identify and annotate genetic differences found in whole-genome and whole-population NGS data from haploid microbes where a high-quality reference genome is available. These methods can also be used to analyze mutants isolated in genetic screens and to detect unintended mutations that may occur during strain construction and genome editing.

0 comments Cited 509 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The molecular diversity of adaptive convergence.

Olivier Tenaillon, Alejandra Rodríguez-Verdugo, Rebecca L Gaut … (2012)

To estimate the number and diversity of beneficial mutations, we experimentally evolved 115 populations of Escherichia coli to 42.2°C for 2000 generations and sequenced one genome from each population. We identified 1331 total mutations, affecting more than 600 different sites. Few mutations were shared among replicates, but a strong pattern of convergence emerged at the level of genes, operons, and functional complexes. Our experiment uncovered a set of primary functional targets of high temperature, but we estimate that many other beneficial mutations could contribute to similar adaptive outcomes. We inferred the pervasive presence of epistasis among beneficial mutations, which shaped adaptive trajectories into at least two distinct pathways involving mutations either in the RNA polymerase complex or the termination factor rho.

0 comments Cited 312 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Rate and molecular spectrum of spontaneous mutations in the bacterium Escherichia coli as determined by whole-genome sequencing.

Heewook Lee, Ellen Popodi, Haixu Tang … (2012)

Knowledge of the rate and nature of spontaneous mutation is fundamental to understanding evolutionary and molecular processes. In this report, we analyze spontaneous mutations accumulated over thousands of generations by wild-type Escherichia coli and a derivative defective in mismatch repair (MMR), the primary pathway for correcting replication errors. The major conclusions are (i) the mutation rate of a wild-type E. coli strain is ~1 × 10(-3) per genome per generation; (ii) mutations in the wild-type strain have the expected mutational bias for G:C > A:T mutations, but the bias changes to A:T > G:C mutations in the absence of MMR; (iii) during replication, A:T > G:C transitions preferentially occur with A templating the lagging strand and T templating the leading strand, whereas G:C > A:T transitions preferentially occur with C templating the lagging strand and G templating the leading strand; (iv) there is a strong bias for transition mutations to occur at 5'ApC3'/3'TpG5' sites (where bases 5'A and 3'T are mutated) and, to a lesser extent, at 5'GpC3'/3'CpG5' sites (where bases 5'G and 3'C are mutated); (v) although the rate of small (≤4 nt) insertions and deletions is high at repeat sequences, these events occur at only 1/10th the genomic rate of base-pair substitutions. MMR activity is genetically regulated, and bacteria isolated from nature often lack MMR capacity, suggesting that modulation of MMR can be adaptive. Thus, comparing results from the wild-type and MMR-defective strains may lead to a deeper understanding of factors that determine mutation rates and spectra, how these factors may differ among organisms, and how they may be shaped by environmental conditions.

0 comments Cited 285 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Jeffrey E Barrick: jbarrick@cm.utexas.edu

Geoffrey Colburn: geoffreycolburn@gmail.com

Daniel E Deatherage: daniel.deatherage@gmail.com

Charles C Traverse: chucktraverse@gmail.com

Matthew D Strand: MDStrand@gmail.com

Jordan J Borges: gravity@utexas.edu

David B Knoester: knoestdb@miamioh.edu

Aaron Reba: aaronreba@gmail.com

Austin G Meyer: austin.g.meyer@gmail.com

Journal

Journal ID (nlm-ta): BMC Genomics

Journal ID (iso-abbrev): BMC Genomics

Title: BMC Genomics

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2164

Publication date (Electronic): 29 November 2014

Publication date Collection: 2014

Volume: 15

Issue: 1

Electronic Location Identifier: 1039

Affiliations

[ ]Department of Molecular Biosciences, Institute for Cellular and Molecular Biology, Center for Systems and Synthetic Biology, Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX 78712 USA

[ ]Department of Computer Science and Software Engineering, Miami University, Oxford, OH 45056 USA

Article

Publisher ID: 6751

DOI: 10.1186/1471-2164-15-1039

PMC ID: 4300727

PubMed ID: 25432719

SO-VID: a122d06a-5131-4c04-81ed-68f4db831d06

License:

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 3 September 2014

Date accepted : 18 November 2014

Custom metadata

ScienceOpen disciplines: Genetics

Keywords: genome resequencing,experimental evolution,strain engineering,insertion sequence,translocation

Data availability:

ScienceOpen disciplines: Genetics

Keywords: genome resequencing, experimental evolution, strain engineering, insertion sequence, translocation

Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq

Read this article at

Abstract

Background

Results

Conclusions

Electronic supplementary material

Related collections

Genome Engineering using CRISPR

Most cited references 21

Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseq.

The molecular diversity of adaptive convergence.

Rate and molecular spectrum of spontaneous mutations in the bacterium Escherichia coli as determined by whole-genome sequencing.

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 78

Cited by 105