Blog
About

  • Record: found
  • Abstract: found
  • Article: found
Is Open Access

A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing

Read this article at

Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      The genome of many species in the biosphere is a diploid consisting of paternal and maternal haplotypes. The differences between these two haplotypes range from single nucleotide polymorphisms (SNPs) to large-scale structural variations (SVs). Existing genome assemblers for next-generation sequencing platforms attempt to reconstruct one consensus sequence, which is a mosaic of two parental haplotypes. Reconstructing paternal and maternal haplotypes is an important task in linkage analysis and association studies. This study designs and implemented HapSVAssembler on the basis of Genetic Algorithm (GA) and paired-end sequencing. The proposed method builds a consensus sequence, identifies various types of heterozygous variants, and reconstructs the paternal and maternal haplotypes by solving an optimization problem with a GA algorithm. Experimental results indicate that the HapSVAssembler has high accuracy and contiguity under various sequencing coverage, error rates, and insert sizes. The program is tested on pilot sequencing of a highly heterozygous genome, and 12,781 heterozygous SNPs and 602 hemizygous SVs are identified. We observe that, although the number of SVs is much less than that of SNPs, the genomic regions occupied by SVs are much larger, implying the heterozygosity computed using SNPs or k-mer spectrum may be under-estimated.

      Related collections

      Most cited references 24

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The Sequence Alignment/Map format and SAMtools

      Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk
        Bookmark
        • Record: found
        • Abstract: found
        • Article: found
        Is Open Access

        Fast and accurate short read alignment with Burrows–Wheeler transform

        Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ∼10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: rd@sanger.ac.uk
          Bookmark
          • Record: found
          • Abstract: found
          • Article: not found

          Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

          We have developed a new set of algorithms, collectively called "Velvet," to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words (k-mers) that is ideal for high coverage, very short read (25-50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length, up to 50-kb N50 length in simulations of prokaryotic data and 3-kb N50 on simulated mammalian BACs. When applied to real Solexa data sets without read pairs, Velvet generated contigs of approximately 8 kb in a prokaryote and 2 kb in a mammalian BAC, in close agreement with our simulated results without read-pair information. Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies.
            Bookmark

            Author and article information

            Affiliations
            [1 ]Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi, Taiwan
            [2 ]Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan
            [3 ]Biotechnology Center in Southern Taiwan, Academia Sinica, Tainan, Taiwan
            [4 ]Institute of Biomedical Sciences, National Chung Hsing University, Taichung, Taiwan
            Xiamen University, CHINA
            Author notes

            Competing Interests: The authors have declared that no competing interests exist.

            • Conceptualization: YTH CKT.

            • Methodology: YTH CKT SYC.

            • Resources: CSL MTC JWC.

            • Software: SYC YTH.

            • Writing – original draft: SYC YTH.

            • Writing – review & editing: YTH CKT.

            Contributors
            Role: Editor
            Journal
            PLoS One
            PLoS ONE
            plos
            plosone
            PLoS ONE
            Public Library of Science (San Francisco, CA USA )
            1932-6203
            2016
            18 November 2016
            : 11
            : 11
            27861560
            5115803
            PONE-D-16-32260
            10.1371/journal.pone.0166721
            (Editor)
            © 2016 Ting et al

            This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

            Counts
            Figures: 20, Tables: 3, Pages: 24
            Product
            Funding
            YTH was supported in part by the Ministry of Science and Technology (MOST) with grant numbers 103-2923-E-194-001-MY3 and 104-2221-E-194-048-MY2.
            Categories
            Research Article
            Biology and Life Sciences
            Evolutionary Biology
            Population Genetics
            Haplotypes
            Biology and Life Sciences
            Genetics
            Population Genetics
            Haplotypes
            Biology and Life Sciences
            Population Biology
            Population Genetics
            Haplotypes
            Biology and life sciences
            Molecular biology
            Molecular biology techniques
            DNA construction
            DNA library construction
            Genomic Library Construction
            Research and analysis methods
            Molecular biology techniques
            DNA construction
            DNA library construction
            Genomic Library Construction
            Biology and Life Sciences
            Computational Biology
            Genome Analysis
            Sequence Assembly Tools
            Biology and Life Sciences
            Genetics
            Genomics
            Genome Analysis
            Sequence Assembly Tools
            Biology and Life Sciences
            Developmental Biology
            Genomic Imprinting
            Biology and Life Sciences
            Genetics
            Epigenetics
            Genomic Imprinting
            Biology and Life Sciences
            Molecular Biology
            Molecular Biology Techniques
            Sequencing Techniques
            Genome Sequencing
            Research and Analysis Methods
            Molecular Biology Techniques
            Sequencing Techniques
            Genome Sequencing
            Biology and Life Sciences
            Molecular Biology
            Molecular Biology Techniques
            Gene Mapping
            Chromosome Mapping
            Research and Analysis Methods
            Molecular Biology Techniques
            Gene Mapping
            Chromosome Mapping
            Biology and Life Sciences
            Computational Biology
            Genome Complexity
            Biology and Life Sciences
            Genetics
            Genomics
            Genome Complexity
            Biology and Life Sciences
            Molecular Biology
            Molecular Biology Techniques
            Sequencing Techniques
            Sequence Analysis
            Sequence Alignment
            Research and Analysis Methods
            Molecular Biology Techniques
            Sequencing Techniques
            Sequence Analysis
            Sequence Alignment
            Custom metadata
            All relevant data are within the paper and its Supporting Information files.

            Uncategorized

            Comments

            Comment on this article