SASI-Seq: sample assurance Spike-Ins, and highly differentiating 384 barcoding for Illumina sequencing

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

A minor but significant fraction of samples subjected to next-generation sequencing methods are either mixed-up or cross-contaminated. These events can lead to false or inconclusive results. We have therefore developed SASI-Seq; a process whereby a set of uniquely barcoded DNA fragments are added to samples destined for sequencing. From the final sequencing data, one can verify that all the reads derive from the original sample(s) and not from contaminants or other samples.

Results

By adding a mixture of three uniquely barcoded amplicons, of different sizes spanning the range of insert sizes one would normally use for Illumina sequencing, at a spike-in level of approximately 0.1%, we demonstrate that these fragments remain intimately associated with the sample. They can be detected following even the tightest size selection regimes or exome enrichment and can report the occurrence of sample mix-ups and cross-contamination.

As a consequence of this work, we have designed a set of 384 eleven-base Illumina barcode sequences that are at least 5 changes apart from each other, allowing for single-error correction and very low levels of barcode misallocation due to sequencing error.

Conclusion

SASI-Seq is a simple, inexpensive and flexible tool that enables sample assurance, allows deconvolution of sample mix-ups and reports levels of cross-contamination between samples throughout NGS workflows.

Related collections

Most cited references 23

Record: found
Abstract: found
Article: not found

Detection of ultra-rare mutations by next-generation sequencing.

Michael W Schmitt, Scott Kennedy, Jesse Salk … (2012)

Next-generation DNA sequencing promises to revolutionize clinical medicine and basic research. However, while this technology has the capacity to generate hundreds of billions of nucleotides of DNA sequence in a single experiment, the error rate of ~1% results in hundreds of millions of sequencing mistakes. These scattered errors can be tolerated in some applications but become extremely problematic when "deep sequencing" genetically heterogeneous mixtures, such as tumors or mixed microbial populations. To overcome limitations in sequencing accuracy, we have developed a method termed Duplex Sequencing. This approach greatly reduces errors by independently tagging and sequencing each of the two strands of a DNA duplex. As the two strands are complementary, true mutations are found at the same position in both strands. In contrast, PCR or sequencing errors result in mutations in only one strand and can thus be discounted as technical error. We determine that Duplex Sequencing has a theoretical background error rate of less than one artifactual mutation per billion nucleotides sequenced. In addition, we establish that detection of mutations present in only one of the two strands of duplex DNA can be used to identify sites of DNA damage. We apply the method to directly assess the frequency and pattern of random mutations in mitochondrial DNA from human cells.

0 comments Cited 384 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A large genome center's improvements to the Illumina sequencing system.

Michael Quail, Iwanka Kozarewa, Frances Smith … (2008)

The Wellcome Trust Sanger Institute is one of the world's largest genome centers, and a substantial amount of our sequencing is performed with 'next-generation' massively parallel sequencing technologies: in June 2008 the quantity of purity-filtered sequence data generated by our Genome Analyzer (Illumina) platforms reached 1 terabase, and our average weekly Illumina production output is currently 64 gigabases. Here we describe a set of improvements we have made to the standard Illumina protocols to make the library preparation more reliable in a high-throughput environment, to reduce bias, tighten insert size distribution and reliably obtain high yields of data.

0 comments Cited 300 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The Use of Coded PCR Primers Enables High-Throughput Sequencing of Multiple Homolog Amplification Products by 454 Parallel Sequencing

Jonas Binladen, M Gilbert, Jonathan Bollback … (2007)

Background The invention of the Genome Sequence 20™ DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. Methodology We use conventional PCR with 5′-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20™ DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5′tag-analysis. Conclusions We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5′ nucleotide of the tag. In particular, primers 5′ labelled with a cytosine are heavily overrepresented among the final sequences, while those 5′ labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5′primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses, population genetics, and phylogenetics.

0 comments Cited 183 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Michael A Quail

Miriam Smith

David Jackson

Steven Leonard

Thomas Skelly

Harold P Swerdlow

Yong Gu

Peter Ellis

Journal

Journal ID (nlm-ta): BMC Genomics

Journal ID (iso-abbrev): BMC Genomics

Title: BMC Genomics

Publisher: BioMed Central

ISSN (Electronic): 1471-2164

Publication date Collection: 2014

Publication date (Electronic): 7 February 2014

Volume: 15

Page: 110

Affiliations

[1 ]Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambs, UK

[2 ]Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Bldg. 427, 21702-1201 Frederick, MD, USA

Article

Publisher ID: 1471-2164-15-110

DOI: 10.1186/1471-2164-15-110

PMC ID: 4008303

PubMed ID: 24507442

SO-VID: 51dad83a-b9f3-4355-95f1-8168ae60cf3b

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

SASI-Seq: sample assurance Spike-Ins, and highly differentiating 384 barcoding for Illumina sequencing

Read this article at

Abstract

Background

Results

Conclusion

Related collections

Arabidopsis genomics

Most cited references 23

Detection of ultra-rare mutations by next-generation sequencing.

A large genome center's improvements to the Illumina sequencing system.

The Use of Coded PCR Primers Enables High-Throughput Sequencing of Multiple Homolog Amplification Products by 454 Parallel Sequencing

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 192

Cited by 25

Most referenced authors 1,057