17
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others.

          Related collections

          Most cited references100

          • Record: found
          • Abstract: found
          • Article: not found

          Assembly algorithms for next-generation sequencing data.

          The emergence of next-generation sequencing platforms led to resurgence of research in whole-genome shotgun assembly algorithms and software. DNA sequencing data from the Roche 454, Illumina/Solexa, and ABI SOLiD platforms typically present shorter read lengths, higher coverage, and different error profiles compared with Sanger sequencing data. Since 2005, several assembly software packages have been created or revised specifically for de novo assembly of next-generation sequencing data. This review summarizes and compares the published descriptions of packages named SSAKE, SHARCGS, VCAKE, Newbler, Celera Assembler, Euler, Velvet, ABySS, AllPaths, and SOAPdenovo. More generally, it compares the two standard methods known as the de Bruijn graph approach and the overlap/layout/consensus approach to assembly. Copyright 2010 Elsevier Inc. All rights reserved.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A whole-genome assembly of Drosophila.

            We report on the quality of a whole-genome assembly of Drosophila melanogaster and the nature of the computer algorithms that accomplished it. Three independent external data sources essentially agree with and support the assembly's sequence and ordering of contigs across the euchromatic portion of the genome. In addition, there are isolated contigs that we believe represent nonrepetitive pockets within the heterochromatin of the centromeres. Comparison with a previously sequenced 2.9- megabase region indicates that sequencing accuracy within nonrepetitive segments is greater than 99. 99% without manual curation. As such, this initial reconstruction of the Drosophila sequence should be of substantial value to the scientific community.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Repbase update: a database and an electronic journal of repetitive elements.

              J Jurka (2000)
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                02 December 2019
                04 October 2019
                04 October 2019
                : 47
                : 21
                : 10994-11006
                Affiliations
                [1 ] Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo , NO-0316 Oslo, Norway
                [2 ] Faculty of Biology, Johannes Gutenberg University Mainz , Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
                [3 ] European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI) , Wellcome Genome Campus, Hinxton. CB10 1SD, UK
                [4 ] Institute of Informatics, Silesian University of Technology , Akademicka 16, 44-100 Gliwice, Poland
                [5 ] Institute of Biochemistry and Biophysics PAS , Pawińskiego 5A, 02-106 Warsaw, Poland
                [6 ] Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Universite Montpellier 1919 Route de Mende , CEDEX 5, 34293 Montpellier, France
                [7 ] Institut de Biologie Computationnelle , 34095 Montpellier, France
                [8 ] Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus , PO Box 20537, CY 1678 Nicosia, Cyprus
                [9 ] Institute of Applied Simulations, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW) , Wädenswil, Switzerland
                [10 ] Swiss Institute of Bioinformatics (SIB) , Lausanne, Switzerland
                [11 ] Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo , NO-0316 Oslo, Norway
                Author notes
                To whom correspondence should be addressed. Tel: +47 22857654; Email: dirk.linke@ 123456ibv.uio.no
                Author information
                http://orcid.org/0000-0002-1932-8212
                http://orcid.org/0000-0003-0235-9810
                http://orcid.org/0000-0003-3663-2352
                http://orcid.org/0000-0001-6650-1711
                http://orcid.org/0000-0002-6982-4660
                http://orcid.org/0000-0003-1887-7209
                http://orcid.org/0000-0002-2342-6886
                http://orcid.org/0000-0003-3352-4831
                http://orcid.org/0000-0002-8861-5397
                http://orcid.org/0000-0003-3150-6752
                Article
                gkz841
                10.1093/nar/gkz841
                6868369
                31584084
                f4fe1996-57f0-4b5d-bff0-00d35776062e
                © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 01 October 2019
                : 03 September 2019
                : 07 June 2019
                Page count
                Pages: 13
                Funding
                Funded by: Research Council of Norway 10.13039/501100005416
                Award ID: 251076
                Funded by: University of Oslo 10.13039/501100005366
                Funded by: Institute of Informatics
                Funded by: European Union through the European Social Fund
                Award ID: POWR.03.02.00-00-I029
                Categories
                Survey and Summary

                Genetics
                Genetics

                Comments

                Comment on this article