20
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      STEAK: A specific tool for transposable elements and retrovirus detection in high-throughput sequencing data

      research-article

      Read this article at

      ScienceOpenPublisherPMC
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The advancements of high-throughput genomics have unveiled much about the human genome highlighting the importance of variations between individuals and their contribution to disease. Even though numerous software have been developed to make sense of large genomics datasets, a major short falling of these has been the inability to cope with repetitive regions, specifically to validate structural variants and accordingly assess their role in disease. Here we describe our program STEAK, a massively parallel software designed to detect chimeric reads in high-throughput sequencing data for a broad number of applications such as identifying presence/absence, as well as discovery of transposable elements (TEs), and retroviral integrations. We highlight the capabilities of STEAK by comparing its efficacy in locating HERV-K HML-2 in clinical whole genome projects, target enrichment sequences, and in the 1000 Genomes CEU Trio to the performance of other TE and virus detecting tools. We show that STEAK outperforms other software in terms of computational efficiency, sensitivity, and specificity. We demonstrate that STEAK is a robust tool, which allows analysts to flexibly detect and evaluate TE and retroviral integrations in a diverse range of sequencing projects for both research and clinical purposes.

          Related collections

          Most cited references22

          • Record: found
          • Abstract: found
          • Article: not found

          ART: a next-generation sequencing read simulator.

          ART is a set of simulation tools that generate synthetic next-generation sequencing reads. This functionality is essential for testing and benchmarking tools for next-generation sequencing data analysis including read alignment, de novo assembly and genetic variation discovery. ART generates simulated sequencing reads by emulating the sequencing process with built-in, technology-specific read error models and base quality value profiles parameterized empirically in large sequencing datasets. We currently support all three major commercial next-generation sequencing platforms: Roche's 454, Illumina's Solexa and Applied Biosystems' SOLiD. ART also allows the flexibility to use customized read error model parameters and quality profiles. Both source and binary software packages are available at http://www.niehs.nih.gov/research/resources/software/art.
            • Record: found
            • Abstract: found
            • Article: not found

            Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads.

            High-volume sequencing of DNA and RNA is now within reach of any research laboratory and is quickly becoming established as a key research tool. In many workflows, each of the short sequences ("reads") resulting from a sequencing run are first "mapped" (aligned) to a reference sequence to infer the read from which the genomic location derived, a challenging task because of the high data volumes and often large genomes. Existing read mapping software excel in either speed (e.g., BWA, Bowtie, ELAND) or sensitivity (e.g., Novoalign), but not in both. In addition, performance often deteriorates in the presence of sequence variation, particularly so for short insertions and deletions (indels). Here, we present a read mapper, Stampy, which uses a hybrid mapping algorithm and a detailed statistical model to achieve both speed and sensitivity, particularly when reads include sequence variation. This results in a higher useable sequence yield and improved accuracy compared to that of existing software.
              • Record: found
              • Abstract: found
              • Article: not found

              Target-enrichment strategies for next-generation sequencing.

              We have not yet reached a point at which routine sequencing of large numbers of whole eukaryotic genomes is feasible, and so it is often necessary to select genomic regions of interest and to enrich these regions before sequencing. There are several enrichment approaches, each with unique advantages and disadvantages. Here we describe our experiences with the leading target-enrichment technologies, the optimizations that we have performed and typical results that can be obtained using each. We also provide detailed protocols for each technology so that end users can find the best compromise between sensitivity, specificity and uniformity for their particular project.

                Author and article information

                Journal
                Virus Evol
                Virus Evol
                vevolu
                Virus Evolution
                Oxford University Press
                2057-1577
                July 2017
                21 August 2017
                21 August 2017
                : 3
                : 2
                : vex023
                Affiliations
                [1 ]Department of Zoology, University of Oxford, Oxfordshire, UK
                [2 ]Science and Technology Facilities Council, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire, UK
                [3 ]Nuffield Department of Medicine, University of Oxford, Oxfordshire, UK
                [4 ]Department of Hygiene, Epidemiology and Medical Statistics, Medical School, National and Kapodistrian University of Athens, Athens, Greece
                Author notes
                Article
                vex023
                10.1093/ve/vex023
                5597868
                28948042
                ccf271c8-67a0-4bf7-9764-4d4e767e653b
                © The Author 2017. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                Page count
                Pages: 12
                Funding
                Funded by: Medical Research Council 10.13039/501100000265
                Award ID: MR/K010565/1 to GM
                Categories
                Resources

                endogenous retroviruses,evolution,transposons,hts,virus integration,mobile element

                Comments

                Comment on this article

                Related Documents Log