Blog
About

3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      fastp: an ultra-fast all-in-one FASTQ preprocessor

      1 , 2 , 1 , 1 , 2

      Bioinformatics

      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation

          Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient.

          Results

          We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2–5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools.

          Availability and implementation

          The open-source code and corresponding instructions are available at https://github.com/OpenGene/fastp.

          Related collections

          Most cited references 4

          • Record: found
          • Abstract: found
          • Article: not found

          Detecting ultralow-frequency mutations by Duplex Sequencing.

          Duplex Sequencing (DS) is a next-generation sequencing methodology capable of detecting a single mutation among >1 × 10(7) wild-type nucleotides, thereby enabling the study of heterogeneous populations and very-low-frequency genetic alterations. DS can be applied to any double-stranded DNA sample, but it is ideal for small genomic regions of <1 Mb in size. The method relies on the ligation of sequencing adapters harboring random yet complementary double-stranded nucleotide sequences to the sample DNA of interest. Individually labeled strands are then PCR-amplified, creating sequence 'families' that share a common tag sequence derived from the two original complementary strands. Mutations are scored only if the variant is present in the PCR families arising from both of the two DNA strands. Here we provide a detailed protocol for efficient DS adapter synthesis, library preparation and target enrichment, as well as an overview of the data analysis workflow. The protocol typically takes 1-3 d.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            AfterQC: automatic filtering, trimming, error removing and quality control for fastq data

            Background Some applications, especially those clinical applications requiring high accuracy of sequencing data, usually have to face the troubles caused by unavoidable sequencing errors. Several tools have been proposed to profile the sequencing quality, but few of them can quantify or correct the sequencing errors. This unmet requirement motivated us to develop AfterQC, a tool with functions to profile sequencing errors and correct most of them, plus highly automated quality control and data filtering features. Different from most tools, AfterQC analyses the overlapping of paired sequences for pair-end sequencing data. Based on overlapping analysis, AfterQC can detect and cut adapters, and furthermore it gives a novel function to correct wrong bases in the overlapping regions. Another new feature is to detect and visualise sequencing bubbles, which can be commonly found on the flowcell lanes and may raise sequencing errors. Besides normal per cycle quality and base content plotting, AfterQC also provides features like polyX (a long sub-sequence of a same base X) filtering, automatic trimming and K-MER based strand bias profiling. Results For each single or pair of FastQ files, AfterQC filters out bad reads, detects and eliminates sequencer’s bubble effects, trims reads at front and tail, detects the sequencing errors and corrects part of them, and finally outputs clean data and generates HTML reports with interactive figures. AfterQC can run in batch mode with multiprocess support, it can run with a single FastQ file, a single pair of FastQ files (for pair-end sequencing), or a folder for all included FastQ files to be processed automatically. Based on overlapping analysis, AfterQC can estimate the sequencing error rate and profile the error transform distribution. The results of our error profiling tests show that the error distribution is highly platform dependent. Conclusion Much more than just another new quality control (QC) tool, AfterQC is able to perform quality control, data filtering, error profiling and base correction automatically. Experimental results show that AfterQC can help to eliminate the sequencing errors for pair-end sequencing data to provide much cleaner outputs, and consequently help to reduce the false-positive variants, especially for the low-frequency somatic mutations. While providing rich configurable options, AfterQC can detect and set all the options automatically and require no argument in most cases.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The Emerging Role of "Liquid Biopsies," Circulating Tumor Cells, and Circulating Cell-Free Tumor DNA in Lung Cancer Diagnosis and Identification of Resistance Mutations.

              Therapeutic advances in the treatment of lung cancer are in part due to a more complete understanding of its genomic portrait. The serial monitoring of tumor genotypes, which are instable and prone to changes under selective pressure, is becoming increasingly needed. Although tumor biopsies remain the reference standard for the diagnosis and genotyping of lung cancer, they are invasive and not always feasible. The "liquid biopsies" have the potential to overcome many of these hurdles, allowing a rapid and accurate identification of de novo and resistant genetic alterations and a real-time monitoring of treatment responses. In this review, we provide insights into new liquid diagnostic platforms and discuss the role of circulating tumor cells and circulating tumor DNA in the diagnosis and identification of resistance mutations in lung cancer.
                Bookmark

                Author and article information

                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                01 September 2018
                08 September 2018
                08 September 2018
                : 34
                : 17
                : i884-i890
                Affiliations
                [1 ]Department of Bioinformatics, HaploX Biotechnology, Shenzhen, China
                [2 ]Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
                Author notes
                To whom correspondence should be addressed. E-mail: chen@ 123456haplox.com
                Article
                bty560
                10.1093/bioinformatics/bty560
                6129281
                © The Author(s) 2018. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                Page count
                Pages: 7
                Product
                Funding
                Funded by: Special Funds for Future Industries of Shenzhen
                Award ID: JSGG20160229123927512
                Funded by: National Science Foundation of China 10.13039/501100001809
                Award ID: 61472411
                Categories
                Eccb 2018: European Conference on Computational Biology Proceedings
                Data

                Bioinformatics & Computational biology

                Comments

                Comment on this article