Last-gen nostalgia: a lighthearted rant and reflection on genome sequencing culture

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

I sometimes see them in my dreams. The colorful peaks and troughs, the sharp, crisp waves spread across my computer screen, the rolling nitrogenous mountains, each with its own nucleotide sitting solidly on the summit. I'm talking about electropherograms, of course. Remember them? Those beautiful but oh so “old-gen” bioinformatics data generated from automated Sanger sequencing machines, such as the Applied Biosystems 370—the geriatric of genome sequencers. Don't laugh. It was these capillary-based electrophoretic technologies that gave us the draft human genome sequence (Lander et al., 2001) and the genome maps of many other model organisms, from the bacterium Haemophilus influenza to the yeast Saccharomyces cerevisiae to the multicellular green alga Volvox carteri (Fleischmann et al., 1995; Goffeau et al., 1996; Prochnik et al., 2010). As a grad student, I spent countless hours pruning, editing, assembling, and occasionally oohing and awing over Sanger sequences (Sanger et al., 1977; Smith et al., 1986; Prober et al., 1987). These 800-nucleotide genetic snippets intrigued, inspired, and motivated me. They contained just enough data to pique my interests—a novel exon, strange repeat, or foreign gene—and always left me craving a bit more: one additional sequencing read to extend that PCR product, find that stop codon, or join those lonely contigs. Usually, it would take weeks or months to get that extra read, and when it arrived I would savor the experience, exploring and analyzing it like a new book from a favorite author. After I devoured the data, I would say to myself, “If only I could get my hands on a great number of sequencing reads from my organism of interest then all of my genomic woes would be over.” Naively, I believed that the more sequencing data I had, the more productive I would be. Be careful what you wish for from the genome gods. The onslaught of next-generation sequencing (NGS) technologies (Metzker, 2010; Koboldt et al., 2013) and the access to previously unfathomable amounts of genomic data have made me dizzy, disillusioned, and anything but efficient. Like the proverbial boiling frog, my mind is gradually overheating from an accumulation of NGS reads (Liu et al., 2012). It's a paired-end nightmare, a SOLiD pain in the neck, and a massively parallel migraine. All this HiSeq and MiSeq is clogging-up my internal drive and externals disks. I've taken vacations and returned home only to find that my Illumina reads still haven't finished downloading. I can't move or backup a FASTQ file without needing a coffee break. Last month it got so bad that I tried calling 911 on my 454. I'm certain that I would have had two Nature papers by now if it weren't for that pestering computer cursor that keeps spinning around and around, reminding me of my small memory and pitiful processing power. With all this NGS information, what have I gained (apart from being a chronic user of SEQanswers.com)? Well, I'm a co-investigator of a half a dozen, highly fragmented nuclear genome assemblies for various green algae, with no genome papers anywhere in sight. And don't get me started on the number of transcriptome projects waiting to be written up. What's worse is that I'm still sending more samples for sequencing. It's become my default setting: when in doubt, sequence. If a colleague drops by my office and says, “Smitty, you interested in milkweeds?” My first response is, “You betcha. Let's send some for sequencing?” Student asks: “Professor Smith, do you have any ideas for my honors thesis?” “Hmmm,” I say, “how about we sequence another green alga.” Grant money left over, what do I do? You guessed it: two for one RNA-seq at the campus sequencing facility. And if the data come back contaminated or the quality is poor? Easy, I sequence more! It's gotten to the point where I should begin my conference presentations with, “Hello, my name is David and I'm a NGS addict.” There are some positives to being NGS obsessed. I'm constantly testing and learning the newest bioinformatics software and genome assembly programs. I know all of the hippest genome slang and genetic acronyms. I have learned more than I ever wanted to about Linux, Unix, and Perl, although, as my students regularly point out, I'm still a hack in all three of those areas. I love that I can go to the Sequence Read Archive at the National Centre for Biotechnology Information (Leinonen et al., 2011) (I visit the site incessantly) and in seconds access endless amounts of raw genomic and transcriptomic data from some of the coolest and most bizarre species on earth, and then use these data to mine genes for phylogenetic and other comparative analyses. I'm also an organelle genome junkie, and NGS techniques have made it quick and easy for me to sequence or data mine complete mitochondrial and plastid DNAs from a diversity of interesting taxa throughout the eukaryotic tree of life (Smith, 2012). Sequencing nuclear DNAs has been a different story. Even with huge datasets, state-of-the-art assembly programs, and intricate annotation pipelines, I'm incapable of producing decent nuclear genome assemblies. It doesn't help that the species I choose to investigate are poorly studied and poorly sequenced. For researchers investigating organisms for which high-quality nuclear genome assemblies already exist (i.e., assemblies based on Sanger sequencing), the payoffs of NGS have been great (Koboldt et al., 2013). Perhaps as sequencing technologies improve, personal computing power increases, and bioinformatics software become more user friendly, it will soon be easier for small labs to assemble publication-quality nuclear genomes of non-model taxa. For now, however, the promises of NGS have, at least for me, not lived up to their hype and often resulted in disappointment, frustration, and a loss of perspective. Don't get me wrong, NGS has revolutionized, accelerated, and, in many ways, simplified scientific research. Moreover, new (and soon to come) long-read technologies will alleviate many of the current limitations of NGS (English et al., 2012), such as the absence of a reference genome map. But no matter how long sequencing reads get, NGS will probably never be the panacea of genetics that some claim it to be (Koboldt et al., 2013). I was taught to approach research with specific hypotheses and questions in mind. In the good ol' Sanger days it was questions that drove me toward the sequencing data. But now it's the NGS data that drive my questions. I recently sequenced the transcriptome of a saltwater Chlamydomonas alga and have been knocking my head against the laboratory door asking, “What is the best way to market, package, and publish these data?” I'm trapped in a cycle where hypothesis testing is a postscript to senseless sequencing (Smith, 2013). As we move toward a world with infinite amounts nucleotide sequence information, beyond bench-top sequencers and hundred-dollar genomes, let's take a moment to remember a simpler time, when staring at a string of nucleotides on a screen was special, worthy of celebration, and something to give us pause. When too much data were the least of our worries, and too little was what kept us creative. When the goal was not to amass but to understand genetic data. I have a colleague on the inside—works at a big genome-sequencing centre in California. We had lunch recently and during one of my rants he stopped me and said, “Dave, take it easy, we still got them, a whole factory floor of AB3730xl Sanger sequencers!” Later that month, for old-time's sake, I sent him a few PCR products, which were kicking around the lab, and, sure enough, 2 weeks later three electropherograms arrived in my Inbox, like long lost friends. Anyway, for all those Sanger sequencing geeks out there, caught in a next-gen maze of short reads and long headaches, this one's for you. Conflict of interest statement The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Related collections

Most cited references 7

Record: found
Abstract: found
Article: found

Is Open Access

Comparison of Next-Generation Sequencing Systems

Lin Liu, Yinhu Li, Siliang Li … (2012)

With fast development and wide applications of next-generation sequencing (NGS) technologies, genomic sequence information is within reach to aid the achievement of goals to decode life mysteries, make better crops, detect pathogens, and improve life qualities. NGS systems are typically represented by SOLiD/Ion Torrent PGM from Life Sciences, Genome Analyzer/HiSeq 2000/MiSeq from Illumina, and GS FLX Titanium/GS Junior from Roche. Beijing Genomics Institute (BGI), which possesses the world's biggest sequencing capacity, has multiple NGS systems including 137 HiSeq 2000, 27 SOLiD, one Ion Torrent PGM, one MiSeq, and one 454 sequencer. We have accumulated extensive experience in sample handling, sequencing, and bioinformatics analysis. In this paper, technologies of these systems are reviewed, and first-hand data from extensive experience is summarized and analyzed to discuss the advantages and specifics associated with each sequencing system. At last, applications of NGS are summarized.

0 comments Cited 376 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Life with 6000 genes.

A Goffeau, B G Barrell, H Bussey … (1996)

The genome of the yeast Saccharomyces cerevisiae has been completely sequenced through a worldwide collaboration. The sequence of 12,068 kilobases defines 5885 potential protein-encoding genes, approximately 140 genes specifying ribosomal RNA, 40 genes for small nuclear RNA molecules, and 275 transfer RNA genes. In addition, the complete sequence provides information about the higher order organization of yeast's 16 chromosomes and allows some insight into their evolutionary history. The genome shows a considerable amount of apparent genetic redundancy, and one of the major problems to be tackled during the next stage of the yeast genome project is to elucidate the biological functions of all of these genes.

0 comments Cited 223 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Genomic analysis of organismal complexity in the multicellular green alga Volvox carteri.

Simon E Prochnik, James Umen, Aurora M Nedelcu … (2010)

The multicellular green alga Volvox carteri and its morphologically diverse close relatives (the volvocine algae) are well suited for the investigation of the evolution of multicellularity and development. We sequenced the 138-mega-base pair genome of V. carteri and compared its approximately 14,500 predicted proteins to those of its unicellular relative Chlamydomonas reinhardtii. Despite fundamental differences in organismal complexity and life history, the two species have similar protein-coding potentials and few species-specific protein-coding gene predictions. Volvox is enriched in volvocine-algal-specific proteins, including those associated with an expanded and highly compartmentalized extracellular matrix. Our analysis shows that increases in organismal complexity can be associated with modifications of lineage-specific proteins rather than large-scale invention of protein-coding capacity.

0 comments Cited 211 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

David Roy Smith: URI : http://community.frontiersin.org/people/u/53585

Journal

Journal ID (nlm-ta): Front Genet

Journal ID (iso-abbrev): Front Genet

Journal ID (publisher-id): Front. Genet.

Title: Frontiers in Genetics

Publisher: Frontiers Media S.A.

ISSN (Electronic): 1664-8021

Publication date (Electronic): 22 May 2014

Publication date Collection: 2014

Volume: 5

Electronic Location Identifier: 146

Affiliations

Department of Biology, University of Western Ontario London, ON, Canada

Author notes

*Correspondence: dsmit242@uwo.ca

This article was submitted Bioinformatics and Computational Biology, a section of the journal Frontiers in Genetics.

Edited by: Mensur Dlakic, Montana State University, USA

Reviewed by: Jiangxin Wang, Arizona State University, USA; Min Zhao, Vanderbilt University, USA; Thiruvarangan Ramaraj, National Center for Genome Resources, USA

Article

DOI: 10.3389/fgene.2014.00146

PMC ID: 4033051

SO-VID: 789fec58-a55a-4d71-bffa-450884ccb2e3

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

History

Date received : 30 March 2014

Date accepted : 05 May 2014

Page count

Figures: 0, Tables: 0, Equations: 0, References: 14, Pages: 2, Words: 1870

Last-gen nostalgia: a lighthearted rant and reflection on genome sequencing culture

Read this article at

Abstract

Related collections

Genome Integrity

Most cited references 7

Comparison of Next-Generation Sequencing Systems

Life with 6000 genes.

Genomic analysis of organismal complexity in the multicellular green alga Volvox carteri.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 47

Most referenced authors 1,543