6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Long Interspersed Element-1 (LINE-1) retrotransposition contributes to inter- and intra-individual genetic variation and occasionally can lead to human genetic disorders. Various strategies have been developed to identify human-specific LINE-1 (L1Hs) insertions from short-read whole genome sequencing (WGS) data; however, they have limitations in detecting insertions in complex repetitive genomic regions. Here, we developed a computational tool (PALMER) and used it to identify 203 non-reference L1Hs insertions in the NA12878 benchmark genome. Using PacBio long-read sequencing data, we identified L1Hs insertions that were absent in previous short-read studies (90/203). Approximately 81% (73/90) of the L1Hs insertions reside within endogenous LINE-1 sequences in the reference assembly and the analysis of unique breakpoint junction sequences revealed 63% (57/90) of these L1Hs insertions could be genotyped in 1000 Genomes Project sequences. Moreover, we observed that amplification biases encountered in single-cell WGS experiments led to a wide variation in L1Hs insertion detection rates between four individual NA12878 cells; under-amplification limited detection to 32% (65/203) of insertions, whereas over-amplification increased false positive calls. In sum, these data indicate that L1Hs insertions are often missed using standard short-read sequencing approaches and long-read sequencing approaches can significantly improve the detection of L1Hs insertions present in individual genomes.

          Related collections

          Most cited references61

          • Record: found
          • Abstract: found
          • Article: not found

          Recent segmental duplications in the human genome.

          Primate-specific segmental duplications are considered important in human disease and evolution. The inability to distinguish between allelic and duplication sequence overlap has hampered their characterization as well as assembly and annotation of our genome. We developed a method whereby each public sequence is analyzed at the clone level for overrepresentation within a whole-genome shotgun sequence. This test has the ability to detect duplications larger than 15 kilobases irrespective of copy number, location, or high sequence similarity. We mapped 169 large regions flanked by highly similar duplications. Twenty-four of these hot spots of genomic instability have been associated with genetic disease. Our analysis indicates a highly nonrandom chromosomal and genic distribution of recent segmental duplications, with a likely role in expanding protein diversity.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Mapping and sequencing of structural variation from eight human genomes.

            Genetic variation among individual humans occurs on many different scales, ranging from gross alterations in the human karyotype to single nucleotide changes. Here we explore variation on an intermediate scale--particularly insertions, deletions and inversions affecting from a few thousand to a few million base pairs. We employed a clone-based method to interrogate this intermediate structural variation in eight individuals of diverse geographic ancestry. Our analysis provides a comprehensive overview of the normal pattern of structural variation present in these genomes, refining the location of 1,695 structural variants. We find that 50% were seen in more than one individual and that nearly half lay outside regions of the genome previously described as structurally variant. We discover 525 new insertion sequences that are not present in the human reference genome and show that many of these are variable in copy number between individuals. Complete sequencing of 261 structural variants reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome. These data provide the first high-resolution sequence map of human structural variation--a standard for genotyping platforms and a prelude to future individual genome sequencing projects.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition.

              R2 is a non-LTR retrotransposable element that inserts at a specific site in the 28S rRNA genes of most insects. We have expressed the open reading frame of the R2 element from Bombyx mori, R2Bm, in E. coli and shown that it encodes both sequence-specific endonuclease and reverse transcriptase activities. The R2 protein makes a specific nick in one of the DNA strands at the insertion site and uses the 3' hydroxyl group exposed by this nick to prime reverse transcription of its RNA transcript. After reverse transcription, cleavage of the second DNA strand occurs. A similar mechanism of insertion may be used by other non-LTR retrotransposable elements as well as short interspersed nucleotide elements.
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                20 February 2020
                19 December 2019
                19 December 2019
                : 48
                : 3
                : 1146-1163
                Affiliations
                [1 ] Department of Computational Medicine and Bioinformatics, University of Michigan Medical School , 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
                [2 ] Department of Human Genetics, University of Michigan Medical School , 1241 East Catherine Street, Ann Arbor, MI 48109, USA
                [3 ] Molecular and Behavioral Neuroscience Institute, University of Michigan Medical School , 109 Zina Pitcher Place, Ann Arbor, MI 48109, USA
                [4 ] Department of Internal Medicine, University of Michigan , 1500 East Medical Center Drive, Ann Arbor, MI 48109, USA
                Author notes
                To whom correspondence should be addressed. Tel: +1 734-647-9628; Email: remills@ 123456umich.edu
                Author information
                http://orcid.org/0000-0003-4755-1072
                http://orcid.org/0000-0002-9631-1465
                http://orcid.org/0000-0002-5308-4864
                http://orcid.org/0000-0003-3425-6998
                Article
                gkz1173
                10.1093/nar/gkz1173
                7026601
                31853540
                8773edcc-95b5-4f72-808c-5451d6bf1af8
                © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@ 123456oup.com

                History
                : 05 December 2019
                : 14 November 2019
                : 07 August 2019
                Page count
                Pages: 18
                Funding
                Funded by: National Institutes of Health 10.13039/100000002
                Award ID: MH106892
                Award ID: HG000040
                Categories
                Computational Biology

                Genetics
                Genetics

                Comments

                Comment on this article