3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      DeepSimulator: a deep simulator for Nanopore sequencing

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation

          Oxford Nanopore sequencing is a rapidly developed sequencing technology in recent years. To keep pace with the explosion of the downstream data analytical tools, a versatile Nanopore sequencing simulator is needed to complement the experimental data as well as to benchmark those newly developed tools. However, all the currently available simulators are based on simple statistics of the produced reads, which have difficulty in capturing the complex nature of the Nanopore sequencing procedure, the main task of which is the generation of raw electrical current signals.

          Results

          Here we propose a deep learning based simulator, DeepSimulator, to mimic the entire pipeline of Nanopore sequencing. Starting from a given reference genome or assembled contigs, we simulate the electrical current signals by a context-dependent deep learning model, followed by a base-calling procedure to yield simulated reads. This workflow mimics the sequencing procedure more naturally. The thorough experiments performed across four species show that the signals generated by our context-dependent model are more similar to the experimentally obtained signals than the ones generated by the official context-independent pore model. In terms of the simulated reads, we provide a parameter interface to users so that they can obtain the reads with different accuracies ranging from 83 to 97%. The reads generated by the default parameter have almost the same properties as the real data. Two case studies demonstrate the application of DeepSimulator to benefit the development of tools in de novo assembly and in low coverage SNP detection.

          Availability and implementation

          The software can be accessed freely at: https://github.com/lykaust15/DeepSimulator.

          Supplementary information

          Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references19

          • Record: found
          • Abstract: found
          • Article: not found

          A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

          Heng Li (2011)
          Most existing methods for DNA sequence analysis rely on accurate sequences or genotypes. However, in applications of the next-generation sequencing (NGS), accurate genotypes may not be easily obtained (e.g. multi-sample low-coverage sequencing or somatic mutation discovery). These applications press for the development of new methods for analyzing sequence data with uncertainty. We present a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation. On real data, we demonstrate that our method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping. We also highlight the necessity of using symmetric datasets for finding somatic mutations and confirm that for discovering rare events, mismapping is frequently the leading source of errors. http://samtools.sourceforge.net. hengli@broadinstitute.org.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Framewise phoneme classification with bidirectional LSTM and other neural network architectures.

            In this paper, we present bidirectional Long Short Term Memory (LSTM) networks, and a modified, full gradient version of the LSTM learning algorithm. We evaluate Bidirectional LSTM (BLSTM) and several other network architectures on the benchmark task of framewise phoneme classification, using the TIMIT database. Our main findings are that bidirectional networks outperform unidirectional ones, and Long Short Term Memory (LSTM) is much faster and also more accurate than both standard Recurrent Neural Nets (RNNs) and time-windowed Multilayer Perceptrons (MLPs). Our results support the view that contextual information is crucial to speech processing, and suggest that BLSTM is an effective architecture with which to exploit it.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Color indexing

                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                01 September 2018
                06 April 2018
                06 April 2018
                : 34
                : 17
                : 2899-2908
                Affiliations
                [1 ]Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, Saudi Arabia
                [2 ]Biological and Environmental Sciences and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
                Author notes
                To whom correspondence should be addressed. E-mail: xin.gao@ 123456kaust.edu.sa or sheng.wang@ 123456kaust.edu.sa
                Article
                bty223
                10.1093/bioinformatics/bty223
                6129308
                29659695
                aa7217ed-c0e8-43fc-9e8c-31baba5e65ba
                © The Author(s) 2018. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                : 13 December 2017
                : 31 March 2018
                : 04 April 2018
                Page count
                Pages: 10
                Funding
                Funded by: King Abdullah University of Science and Technology 10.13039/501100004052
                Funded by: KAUST 10.13039/501100004052
                Funded by: Office of Sponsored Research
                Funded by: OSR
                Award ID: FCC/1/1976-04
                Award ID: URF/1/2602-01
                Award ID: URF/1/3007-01
                Award ID: URF/1/3412-01
                Award ID: URF/1/3450-01
                Categories
                Original Papers
                Genome Analysis

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article