8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Fast and accurate correction of optical mapping data via spaced seeds

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation

          Optical mapping data is used in many core genomics applications, including structural variation detection, scaffolding assembled contigs and mis-assembly detection. However, the pervasiveness of spurious and deleted cut sites in the raw data, which are called Rmaps, make assembly and alignment of them challenging. Although there exists another method to error correct Rmap data, named cOMet, it is unable to scale to even moderately large sized genomes. The challenge faced in error correction is in determining pairs of Rmaps that originate from the same region of the same genome.

          Results

          We create an efficient method for determining pairs of Rmaps that contain significant overlaps between them. Our method relies on the novel and nontrivial adaption and application of spaced seeds in the context of optical mapping, which allows for spurious and deleted cut sites to be accounted for. We apply our method to detecting and correcting these errors. The resulting error correction method, referred to as Elmeri, improves upon the results of state-of-the-art correction methods but in a fraction of the time. More specifically, cOMet required 9.9 CPU days to error correct Rmap data generated from the human genome, whereas Elmeri required less than 15 CPU hours and improved the quality of the Rmaps by more than four times compared to cOMet.

          Availability and implementation

          Elmeri is publicly available under GNU Affero General Public License at https://github.com/LeenaSalmela/Elmeri.

          Supplementary information

          Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references31

          • Record: found
          • Abstract: found
          • Article: not found

          The genome of Chenopodium quinoa

          Constructing a reference genome for quinoa (Chenopodium quinoa) allows for genetic diversity during the evolution of sub-genomes in quinoa to be characterized and markers that may be used to develop sweet commercial varieties are identified.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            PatternHunter: faster and more sensitive homology search.

            Genomics and proteomics studies routinely depend on homology searches based on the strategy of finding short seed matches which are then extended. The exploding genomic data growth presents a dilemma for DNA homology search techniques: increasing seed size decreases sensitivity whereas decreasing seed size slows down computation. We present a new homology search algorithm 'PatternHunter' that uses a novel seed model for increased sensitivity and new hit-processing techniques for significantly increased speed. At Blast levels of sensitivity, PatternHunter is able to find homologies between sequences as large as human chromosomes, in mere hours on a desktop. PatternHunter is available at http://www.bioinformaticssolutions.com, as a commercial package. It runs on all platforms that support Java. PatternHunter technology is being patented; commercial use requires a license from BSI, while non-commercial use will be free.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus).

              We report the ∼2.66-Gb genome sequence of a female Yunnan black goat. The sequence was obtained by combining short-read sequencing data and optical mapping data from a high-throughput whole-genome mapping instrument. The whole-genome mapping data facilitated the assembly of super-scaffolds >5× longer by the N50 metric than scaffolds augmented by fosmid end sequencing (scaffold N50 = 3.06 Mb, super-scaffold N50 = 16.3 Mb). Super-scaffolds are anchored on chromosomes based on conserved synteny with cattle, and the assembly is well supported by two radiation hybrid maps of chromosome 1. We annotate 22,175 protein-coding genes, most of which were recovered in the RNA-seq data of ten tissues. Comparative transcriptomic analysis of the primary and secondary follicles of a cashmere goat reveal 51 genes that are differentially expressed between the two types of hair follicles. This study, whose results will facilitate goat genomics, shows that whole-genome mapping technology can be used for the de novo assembly of large genomes.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                01 February 2020
                03 September 2019
                03 September 2019
                : 36
                : 3
                : 682-689
                Affiliations
                [1 ] Department of Computer Science, Helsinki Institute for Information Technology HIIT , FI-00014 University of Helsinki, Helsinki 00100, Finland
                [2 ] Department of Computer and Information Science and Engineering , University of Florida, Gainesville, FL 32611, USA
                [3 ] Department of Computer Science, Colorado State University , Fort Collins, CO 80523, USA
                Author notes
                To whom correspondence should be addressed. leena.salmela@ 123456cs.helsinki.fi or cboucher@ 123456cise.ufl.edu
                Author information
                http://orcid.org/0000-0002-0756-543X
                http://orcid.org/0000-0002-1647-8741
                http://orcid.org/0000-0002-9283-0049
                Article
                btz663
                10.1093/bioinformatics/btz663
                7005598
                31504206
                d0a3dad6-5f82-4f89-b39e-3505bc435184
                © The Author(s) 2019. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 10 April 2019
                : 25 July 2019
                : 30 August 2019
                Page count
                Pages: 8
                Funding
                Funded by: Academy of Finland 10.13039/501100002341
                Award ID: 308030
                Award ID: 314170
                Award ID: 294143
                Award ID: 319454
                Funded by: National Science Foundation 10.13039/100000001
                Award ID: 1618814
                Categories
                Original Papers
                Genome Analysis

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article