Blog
About

217
views
0
recommends
+1 Recommend
0 collections
    4
    shares
      • Record: found
      • Abstract: not found
      • Article: not found

      Repetitive DNA and next-generation sequencing: computational challenges and solutions

      ,  

      Nature Reviews Genetics

      Springer Science and Business Media LLC

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Repetitive DNA sequences are abundant in a broad range of species, from bacteria to mammals, and they cover nearly half of the human genome. Repeats have always presented technical challenges for sequence alignment and assembly programs. Next-generation sequencing projects, with their short read lengths and high data volumes, have made these challenges more difficult. From a computational perspective, repeats create ambiguities in alignment and assembly, which, in turn, can produce biases and errors when interpreting results. Simply ignoring repeats is not an option, as this creates problems of its own and may mean that important biological phenomena are missed. We discuss the computational problems surrounding repeats and describe strategies used by current bioinformatics systems to solve them.

          Related collections

          Most cited references 39

          • Record: found
          • Abstract: found
          • Article: not found

          The transcriptional landscape of the yeast genome defined by RNA sequencing.

          The identification of untranslated regions, introns, and coding regions within an organism remains challenging. We developed a quantitative sequencing-based method called RNA-Seq for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing and mapped to the genome. We applied RNA-Seq to generate a high-resolution transcriptome map of the yeast genome and demonstrated that most (74.5%) of the nonrepetitive sequence of the yeast genome is transcribed. We confirmed many known and predicted introns and demonstrated that others are not actively used. Alternative initiation codons and upstream open reading frames also were identified for many yeast genes. We also found unexpected 3'-end heterogeneity and the presence of many overlapping genes. These results indicate that the yeast transcriptome is more complex than previously appreciated.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads.

            High-volume sequencing of DNA and RNA is now within reach of any research laboratory and is quickly becoming established as a key research tool. In many workflows, each of the short sequences ("reads") resulting from a sequencing run are first "mapped" (aligned) to a reference sequence to infer the read from which the genomic location derived, a challenging task because of the high data volumes and often large genomes. Existing read mapping software excel in either speed (e.g., BWA, Bowtie, ELAND) or sensitivity (e.g., Novoalign), but not in both. In addition, performance often deteriorates in the presence of sequence variation, particularly so for short insertions and deletions (indels). Here, we present a read mapper, Stampy, which uses a hybrid mapping algorithm and a detailed statistical model to achieve both speed and sensitivity, particularly when reads include sequence variation. This results in a higher useable sequence yield and improved accuracy compared to that of existing software.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Identification of novel transcripts in annotated genomes using RNA-Seq.

              We describe a new 'reference annotation based transcript assembly' problem for RNA-Seq data that involves assembling novel transcripts in the context of an existing annotation. This problem arises in the analysis of expression in model organisms, where it is desirable to leverage existing annotations for discovering novel transcripts. We present an algorithm for reference annotation-based transcript assembly and show how it can be used to rapidly investigate novel transcripts revealed by RNA-Seq in comparison with a reference annotation. The methods described in this article are implemented in the Cufflinks suite of software for RNA-Seq, freely available from http://bio.math.berkeley.edu/cufflinks. The software is released under the BOOST license. cole@broadinstitute.org; lpachter@math.berkeley.edu Supplementary data are available at Bioinformatics online.
                Bookmark

                Author and article information

                Journal
                Nature Reviews Genetics
                Nat Rev Genet
                Springer Science and Business Media LLC
                1471-0056
                1471-0064
                January 2012
                November 29 2011
                January 2012
                : 13
                : 1
                : 36-46
                Article
                10.1038/nrg3117
                3324860
                22124482
                © 2012

                Comments

                Comment on this article