76
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here.

          Related collections

          Most cited references97

          • Record: found
          • Abstract: not found
          • Article: not found

          Identification of common molecular subsequences.

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Error Detecting and Error Correcting Codes

            R. Hamming (1950)
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              An integrated semiconductor device enabling non-optical genome sequencing.

              The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.
                Bookmark

                Author and article information

                Journal
                Brief Bioinform
                Brief. Bioinformatics
                bib
                bib
                Briefings in Bioinformatics
                Oxford University Press
                1467-5463
                1477-4054
                January 2016
                29 May 2015
                29 May 2015
                : 17
                : 1 , Special Issue: Current Progress in Bioinformatics 2016
                : 154-179
                Author notes
                Corresponding author. Alice McHardy, Head of department, Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Inhoffenstraβe 7, 38124 Braunschweig. Tel.: +49 531 6181-1430; Fax: +49 531 6181-1499; E-mail: Alice.McHardy@ 123456helmholtz-hzi.de
                Article
                bbv029
                10.1093/bib/bbv029
                4719071
                26026159
                8c441ea3-c667-4efd-a379-f994c9cf7f91
                © The Author 2015. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 3 March 2015
                : 9 April 2015
                Page count
                Pages: 26
                Categories
                Software Review

                Bioinformatics & Computational biology
                next-generation sequencing,high-throughput sequencing,error profile,error correction,error model,bias

                Comments

                Comment on this article