8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Discovery of large genomic inversions using long range information

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Although many algorithms are now available that aim to characterize different classes of structural variation, discovery of balanced rearrangements such as inversions remains an open problem. This is mainly due to the fact that breakpoints of such events typically lie within segmental duplications or common repeats, which reduces the mappability of short reads. The algorithms developed within the 1000 Genomes Project to identify inversions are limited to relatively short inversions, and there are currently no available algorithms to discover large inversions using high throughput sequencing technologies.

          Results

          Here we propose a novel algorithm, Valor, to discover large inversions using new sequencing methods that provide long range information such as 10X Genomics linked-read sequencing, pooled clone sequencing, or other similar technologies that we commonly refer to as long range sequencing. We demonstrate the utility of Valor using both pooled clone sequencing and 10X Genomics linked-read sequencing generated from the genome of an individual from the HapMap project (NA12878). We also provide a comprehensive comparison of Valor against several state-of-the-art structural variation discovery algorithms that use whole genome shotgun sequencing data.

          Conclusions

          In this paper, we show that Valor is able to accurately discover all previously identified and experimentally validated large inversions in the same genome with a low false discovery rate. Using Valor, we also predicted a novel inversion, which we validated using fluorescent in situ hybridization.

          Valor is available at https://github.com/BilkentCompGen/ Valor

          Electronic supplementary material

          The online version of this article (doi:10.1186/s12864-016-3444-1) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references28

          • Record: found
          • Abstract: found
          • Article: not found

          Global variation in copy number in the human genome.

          Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            ART: a next-generation sequencing read simulator.

            ART is a set of simulation tools that generate synthetic next-generation sequencing reads. This functionality is essential for testing and benchmarking tools for next-generation sequencing data analysis including read alignment, de novo assembly and genetic variation discovery. ART generates simulated sequencing reads by emulating the sequencing process with built-in, technology-specific read error models and base quality value profiles parameterized empirically in large sequencing datasets. We currently support all three major commercial next-generation sequencing platforms: Roche's 454, Illumina's Solexa and Applied Biosystems' SOLiD. ART also allows the flexibility to use customized read error model parameters and quality profiles. Both source and binary software packages are available at http://www.niehs.nih.gov/research/resources/software/art.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Paired-end mapping reveals extensive structural variation in the human genome.

              Structural variation of the genome involves kilobase- to megabase-sized deletions, duplications, insertions, inversions, and complex combinations of rearrangements. We introduce high-throughput and massive paired-end mapping (PEM), a large-scale genome-sequencing method to identify structural variants (SVs) approximately 3 kilobases (kb) or larger that combines the rescue and capture of paired ends of 3-kb fragments, massive 454 sequencing, and a computational approach to map DNA reads onto a reference genome. PEM was used to map SVs in an African and in a putatively European individual and identified shared and divergent SVs relative to the reference genome. Overall, we fine-mapped more than 1000 SVs and documented that the number of SVs among humans is much larger than initially hypothesized; many of the SVs potentially affect gene function. The breakpoint junction sequences of more than 200 SVs were determined with a novel pooling strategy and computational analysis. Our analysis provided insights into the mechanisms of SV formation in humans.
                Bookmark

                Author and article information

                Contributors
                francesca.antonacci@uniba.it
                calkan@cs.bilkent.edu.tr
                Journal
                BMC Genomics
                BMC Genomics
                BMC Genomics
                BioMed Central (London )
                1471-2164
                10 January 2017
                10 January 2017
                2017
                : 18
                : 65
                Affiliations
                [1 ]Department of Computer Engineering, Bilkent University, Bilkent, 06800 Ankara Turkey
                [2 ]Department of Biology, University of Bari, Via Orabona 4, 70125 Bari, Italy
                [3 ]Benaroya Research Institute, 1201 Ninth Avenue, 98101 Seattle, WA USA
                [4 ]Department of Genome Sciences and Howard Hughes Medical Institute, University of Washington, 3720 15th Avenue NE, 98195 Seattle, WA USA
                Author information
                http://orcid.org/0000-0002-5443-0706
                Article
                3444
                10.1186/s12864-016-3444-1
                5223412
                9d02b934-72ad-4f5f-8642-8d8cbf446eb5
                © The Author(s). 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 20 September 2016
                : 19 December 2016
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/501100004963, Seventh Framework Programme;
                Award ID: MCIRG 303772
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100003043, EMBO;
                Award ID: IG-2521
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000051, National Human Genome Research Institute;
                Award ID: HG004120
                Award Recipient :
                Funded by: Firb Futuro in Ricerca
                Award ID: RBFR103CE3
                Award Recipient :
                Categories
                Methodology Article
                Custom metadata
                © The Author(s) 2017

                Genetics
                structural variation,long range sequencing,linked-reads,inversion,read clouds
                Genetics
                structural variation, long range sequencing, linked-reads, inversion, read clouds

                Comments

                Comment on this article