41
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Improving PacBio Long Read Accuracy by Short Read Alignment

      research-article
      1 , 2 , 2 , 1 , *
      PLoS ONE
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The recent development of third generation sequencing (TGS) generates much longer reads than second generation sequencing (SGS) and thus provides a chance to solve problems that are difficult to study through SGS alone. However, higher raw read error rates are an intrinsic drawback in most TGS technologies. Here we present a computational method, LSC, to perform error correction of TGS long reads (LR) by SGS short reads (SR). Aiming to reduce the error rate in homopolymer runs in the main TGS platform, the PacBio® RS, LSC applies a homopolymer compression (HC) transformation strategy to increase the sensitivity of SR-LR alignment without scarifying alignment accuracy. We applied LSC to 100,000 PacBio long reads from human brain cerebellum RNA-seq data and 64 million single-end 75 bp reads from human brain RNA-seq data. The results show LSC can correct PacBio long reads to reduce the error rate by more than 3 folds. The improved accuracy greatly benefits many downstream analyses, such as directional gene isoform detection in RNA-seq study. Compared with another hybrid correction tool, LSC can achieve over double the sensitivity and similar specificity.

          Related collections

          Author and article information

          Contributors
          Role: Editor
          Journal
          PLoS One
          PLoS ONE
          plos
          plosone
          PLoS ONE
          Public Library of Science (San Francisco, USA )
          1932-6203
          2012
          4 October 2012
          : 7
          : 10
          : e46679
          Affiliations
          [1 ]Department of Statistics, Stanford University, Stanford, California, United States of America
          [2 ]Pacific Biosciences of California, Menlo Park, California, United States of America
          University of Iowa, United States of America
          Author notes

          Competing Interests: J.G.U. and L.L. are full-time employees and stock holders of Pacific Biosciences, a company commercializing single-molecule, real-time nucleic acid sequencing technologies.

          Conceived and designed the experiments: KFA WHW. Performed the experiments: JGU LL. Analyzed the data: KFA WHW. Contributed reagents/materials/analysis tools: KFA. Wrote the paper: KFA WHW.

          Article
          PONE-D-12-16737
          10.1371/journal.pone.0046679
          3464235
          23056399
          52bf6cc7-23a3-4455-ae29-f136d1e42c0e
          Copyright @ 2012

          This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

          History
          : 8 June 2012
          : 2 September 2012
          Page count
          Pages: 8
          Funding
          This work was supported by the National Institutes of Health [R01HG005717 to K.F.A. and R01HD057970 to W.H.W.]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
          Categories
          Research Article
          Biology
          Biochemistry
          Nucleic Acids
          RNA
          Computational Biology
          Biological Data Management
          Sequence Analysis
          Molecular Cell Biology
          Nucleic Acids
          RNA

          Uncategorized
          Uncategorized

          Comments

          Comment on this article