Sequencing error profiles of Illumina sequencing instruments.

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Sequencing technology has achieved great advances in the past decade. Studies have previously shown the quality of specific instruments in controlled conditions. Here, we developed a method able to retroactively determine the error rate of most public sequencing datasets. To do this, we utilized the overlaps between reads that are a feature of many sequencing libraries. With this method, we surveyed 1943 different datasets from seven different sequencing instruments produced by Illumina. We show that among public datasets, the more expensive platforms like HiSeq and NovaSeq have a lower error rate and less variation. But we also discovered that there is great variation within each platform, with the accuracy of a sequencing experiment depending greatly on the experimenter. We show the importance of sequence context, especially the phenomenon where preceding bases bias the following bases toward the same identity. We also show the difference in patterns of sequence bias between instruments. Contrary to expectations based on the underlying chemistry, HiSeq X Ten and NovaSeq 6000 share notable exceptions to the preceding-base bias. Our results demonstrate the importance of the specific circumstances of every sequencing experiment, and the importance of evaluating the quality of each one.

Related collections

Most cited references 19

Record: found
Abstract: found
Article: found

Is Open Access

The Sequence Alignment/Map format and SAMtools

Heng Li, Bob Handsaker, Alec Wysoker … (2009)

Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk

0 comments Cited 13830 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.

R. Edgar (2002)

The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data. GEO provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-throughput gene expression and genomic hybridization experiments. GEO is not intended to replace in house gene expression databases that benefit from coherent data sets, and which are constructed to facilitate a particular analytic method, but rather complement these by acting as a tertiary, central data distribution hub. The three central data entities of GEO are platforms, samples and series, and were designed with gene expression and genomic hybridization experiments in mind. A platform is, essentially, a list of probes that define what set of molecules may be detected. A sample describes the set of molecules that are being probed and references a single platform used to generate its molecular abundance data. A series organizes samples into the meaningful data sets which make up an experiment. The GEO repository is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.

0 comments Cited 2231 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Sequence-specific error profile of Illumina sequencers

Kensuke Nakamura, Taku Oshima, Takuya Morimoto … (2011)

We identified the sequence-specific starting positions of consecutive miscalls in the mapping of reads obtained from the Illumina Genome Analyser (GA). Detailed analysis of the miscall pattern indicated that the underlying mechanism involves sequence-specific interference of the base elongation process during sequencing. The two major sequence patterns that trigger this sequence-specific error (SSE) are: (i) inverted repeats and (ii) GGC sequences. We speculate that these sequences favor dephasing by inhibiting single-base elongation, by: (i) folding single-stranded DNA and (ii) altering enzyme preference. This phenomenon is a major cause of sequence coverage variability and of the unfavorable bias observed for population-targeted methods such as RNA-seq and ChIP-seq. Moreover, SSE is a potential cause of false single-nucleotide polymorphism (SNP) calls and also significantly hinders de novo assembly. This article highlights the importance of recognizing SSE and its underlying mechanisms in the hope of enhancing the potential usefulness of the Illumina sequencers.

0 comments Cited 269 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (iso-abbrev): NAR Genom Bioinform

Title: NAR genomics and bioinformatics

Publisher: Oxford University Press (OUP)

ISSN (Electronic): 2631-9268

ISSN (Print): 2631-9268

Publication date (Electronic): Mar 2021

Volume: 3

Issue: 1

Affiliations

[1 ] Graduate Program in Bioinformatics and Genomics, The Huck Institutes for Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA.

[2 ] Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA.

Article

Publisher Item ID: lqab019

DOI: 10.1093/nargab/lqab019

PMC ID: 8002175

PubMed ID: 33817639

SO-VID: 21d4aa69-7f9d-448d-af0a-7c424e7c4b6d

History

Data availability:

Comments

Comment on this article

scite_

Cited by 102

See all cited by

Most referenced authors 621

See all reference authors

- Version 1

Sequencing error profiles of Illumina sequencing instruments.

Read this article at

Abstract

Related collections

Wikipedia Quality

Most cited references 19

The Sequence Alignment/Map format and SAMtools

Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.

Sequence-specific error profile of Illumina sequencers

Author and article information

Journal

Affiliations

Article

History

Comments

Comment on this article

Similar content 91

Cited by 102

Most referenced authors 621