27
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      FMLRC: Hybrid long read error correction using an FM-index

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging “hybrid” assemblies that use long reads for scaffolding and short reads for accuracy.

          Results

          We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods.

          Conclusion

          Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.

          Related collections

          Most cited references12

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Improving PacBio Long Read Accuracy by Short Read Alignment

          The recent development of third generation sequencing (TGS) generates much longer reads than second generation sequencing (SGS) and thus provides a chance to solve problems that are difficult to study through SGS alone. However, higher raw read error rates are an intrinsic drawback in most TGS technologies. Here we present a computational method, LSC, to perform error correction of TGS long reads (LR) by SGS short reads (SR). Aiming to reduce the error rate in homopolymer runs in the main TGS platform, the PacBio® RS, LSC applies a homopolymer compression (HC) transformation strategy to increase the sensitivity of SR-LR alignment without scarifying alignment accuracy. We applied LSC to 100,000 PacBio long reads from human brain cerebellum RNA-seq data and 64 million single-end 75 bp reads from human brain RNA-seq data. The results show LSC can correct PacBio long reads to reduce the error rate by more than 3 folds. The improved accuracy greatly benefits many downstream analyses, such as directional gene isoform detection in RNA-seq study. Compared with another hybrid correction tool, LSC can achieve over double the sensitivity and similar specificity.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A survey of error-correction methods for next-generation sequencing.

            Error Correction is important for most next-generation sequencing applications because highly accurate sequenced reads will likely lead to higher quality results. Many techniques for error correction of sequencing data from next-gen platforms have been developed in the recent years. However, compared with the fast development of sequencing technologies, there is a lack of standardized evaluation procedure for different error-correction methods, making it difficult to assess their relative merits and demerits. In this article, we provide a comprehensive review of many error-correction methods, and establish a common set of benchmark data and evaluation criteria to provide a comparative assessment. We present experimental results on quality, run-time, memory usage and scalability of several error-correction methods. Apart from providing explicit recommendations useful to practitioners, the review serves to identify the current state of the art and promising directions for future research. All error-correction programs used in this article are downloaded from hosting websites. The evaluation tool kit is publicly available at: http://aluru-sun.ece.iastate.edu/doku.php?id=ecr.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences

              Heng Li (2015)
              Motivation: Single Molecule Real-Time (SMRT) sequencing technology and Oxford Nanopore technologies (ONT) produce reads over 10kbp in length, which have enabled high-quality genome assembly at an affordable cost. However, at present, long reads have an error rate as high as 10-15%. Complex and computationally intensive pipelines are required to assemble such reads. Results: We present a new mapper, minimap, and a de novo assembler, miniasm, for efficiently mapping and assembling SMRT and ONT reads without an error correction stage. They can often assemble a sequencing run of bacterial data into a single contig in a few minutes, and assemble 45-fold C. elegans data in 9 minutes, orders of magnitude faster than the existing pipelines. We also introduce a pairwise read mapping format (PAF) and a graphical fragment assembly format (GFA), and demonstrate the interoperability between ours and current tools. Availability and implementation: https://github.com/lh3/minimap and https://github.com/lh3/miniasm Contact: hengli@broadinstitute.org
                Bookmark

                Author and article information

                Contributors
                jeremy_wang@med.unc.edu
                holtjma@cs.unc.edu
                mcmillan@cs.unc.edu
                cdjones@email.unc.edu
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                9 February 2018
                9 February 2018
                2018
                : 19
                : 50
                Affiliations
                [1 ]ISNI 0000000122483208, GRID grid.10698.36, Department of Genetics, , University of North Carolina at Chapel Hill, ; CB 3280, 3144 Genome Sciences Building, 250 Bell Tower Dr, Chapel Hill, 27599 NC USA
                [2 ]ISNI 0000000122483208, GRID grid.10698.36, Department of Computer Science, , University of North Carolina at Chapel Hill, ; Chapel Hill, NC USA
                [3 ]ISNI 0000000122483208, GRID grid.10698.36, Department of Biology and Integrative Program for Biological and Genome Sciences, , University of North Carolina at Chapel Hill, ; Chapel Hill, NC USA
                Article
                2051
                10.1186/s12859-018-2051-3
                5807796
                29426289
                085d115f-9226-4bed-880e-8b377d58b2a6
                © The Author(s) 2018

                Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 8 February 2017
                : 1 February 2018
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100000001, National Science Foundation;
                Award ID: DEB-1457707
                Funded by: FundRef http://dx.doi.org/10.13039/100005562, North Carolina Biotechnology Center;
                Award ID: 2013-MRG-1110
                Funded by: University Cancer Research Fund
                Funded by: FundRef http://dx.doi.org/10.13039/100000057, National Institute of General Medical Sciences;
                Award ID: P50 GM076468
                Funded by: FundRef http://dx.doi.org/10.13039/100000062, National Institute of Diabetes and Digestive and Kidney Diseases;
                Award ID: T32DK007737-17S1
                Categories
                Methodology Article
                Custom metadata
                © The Author(s) 2018

                Bioinformatics & Computational biology
                de novo assembly,hybrid error correction,long read,pacbio,bwt,fm-index

                Comments

                Comment on this article