15
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      MATES: a deep learning-based model for locus-specific quantification of transposable elements in single cell

      research-article

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Transposable elements (TEs) are crucial for genetic diversity and gene regulation. Current single-cell quantification methods often align multi-mapping reads to either ‘best-mapped’ or ‘random-mapped’ locations and categorize them at the subfamily levels, overlooking the biological necessity for accurate, locus-specific TE quantification. Moreover, these existing methods are primarily designed for and focused on transcriptomics data, which restricts their adaptability to single-cell data of other modalities. To address these challenges, here we introduce MATES, a deep-learning approach that accurately allocates multi-mapping reads to specific loci of TEs, utilizing context from adjacent read alignments flanking the TE locus. When applied to diverse single-cell omics datasets, MATES shows improved performance over existing methods, enhancing the accuracy of TE quantification and aiding in the identification of marker TEs for identified cell populations. This development facilitates the exploration of single-cell heterogeneity and gene regulation through the lens of TEs, offering an effective transposon quantification tool for the single-cell genomics community.

          Abstract

          Transposable elements (TEs) pose challenges for quantification due to multi-mapping reads. Here, authors present MATES, a deep learning method that accurately assigns reads to specific TE loci, enhancing TE quantification in single-cell omics datasets and identifying marker TEs in cell populations.

          Related collections

          Most cited references77

          • Record: found
          • Abstract: found
          • Article: not found

          STAR: ultrafast universal RNA-seq aligner.

          Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The Sequence Alignment/Map format and SAMtools

            Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              BEDTools: a flexible suite of utilities for comparing genomic features

              Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing web-based methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools Contact: aaronquinlan@gmail.com; imh4y@virginia.edu Supplementary information: Supplementary data are available at Bioinformatics online.

                Author and article information

                Contributors
                tao.wu@bcm.edu
                jun.ding@mcgill.ca
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                11 October 2024
                11 October 2024
                2024
                : 15
                : 8798
                Affiliations
                [1 ]School of Computer Science, McGill University, ( https://ror.org/01pxwe438) Montreal, Quebec Canada
                [2 ]GRID grid.63984.30, ISNI 0000 0000 9064 4811, Meakins-Christie Laboratories, Translational Research in Respiratory Diseases Program, , Research Institute of the McGill University Health Centre, ; Montreal, Quebec Canada
                [3 ]Department of Medicine, McGill University, ( https://ror.org/01pxwe438) Montreal, Quebec Canada
                [4 ]Quantitative Life Sciences, Faculty of Medicine & Health Sciences, McGill University, ( https://ror.org/01pxwe438) Montreal, Quebec Canada
                [5 ]Department of Molecular and Human Genetics, Baylor College of Medicine, ( https://ror.org/02pttbw34) Houston, TX USA
                [6 ]Department of Neurosurgery, Baylor College of Medicine, ( https://ror.org/02pttbw34) Temple, TX USA
                [7 ]College of Medicine and Irma Lerma Rangel College of Pharmacy, Texas A&M University, ( https://ror.org/01f5ytq51) College Station, TX USA
                [8 ]LIVESTRONG Cancer Institutes and Department of Oncology, Dell Medical School, The University of Texas at Austin, ( https://ror.org/00hj54h04) Austin, TX USA
                [9 ]MyCellome LLC., Pittsburgh, PA USA
                [10 ]Mila-Quebec AI Institue, Montreal, Quebec Canada
                [11 ]Present Address: Neuroscience Institute and Department of Neurosurgery, Baylor Scott & White Health, ( https://ror.org/05wevan27) Temple, TX USA
                Author information
                http://orcid.org/0009-0005-2146-2292
                http://orcid.org/0009-0008-4580-5247
                http://orcid.org/0000-0002-9859-4534
                http://orcid.org/0000-0001-5183-6885
                Article
                53114
                10.1038/s41467-024-53114-7
                11470080
                39394211
                58eb48dd-953c-43b2-8268-d5fb26b7df8d
                © The Author(s) 2024

                Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

                History
                : 25 September 2023
                : 24 September 2024
                Funding
                Funded by: FundRef 501100000024, Gouvernement du Canada | Canadian Institutes of Health Research (Instituts de Recherche en Santé du Canada);
                Award ID: PJT180505
                Funded by: FundRef 100008240, Fonds de Recherche du Québec-Société et Culture (FRQSC);
                Award ID: 295298, 295299
                Funded by: FundRef 501100000038, Gouvernement du Canada | Natural Sciences and Engineering Research Council of Canada (Conseil de Recherches en Sciences Naturelles et en Génie du Canada);
                Award ID: RGPIN2022-04399
                Funded by: Meakins-Christie Chair in Respiratory Research
                Funded by: CPRIT award (RR180072)
                Categories
                Article
                Custom metadata
                © Springer Nature Limited 2024

                Uncategorized
                computational models,genome informatics,software,machine learning,gene regulation
                Uncategorized
                computational models, genome informatics, software, machine learning, gene regulation

                Comments

                Comment on this article

                Related Documents Log