TS-m6A-DL: Tissue-specific identification of N6-methyladenosine sites using a universal deep learning model

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Graphical abstract

First step is the data pre-processing to retrieve the sequence from raw data.The second step is to encode the sequences using on-hot-encoding to make the data readable for the network. The third step is the neural network model construction, and the last step is to classify the sequence as methylated or non-methylated.

Abstract

The most communal post-transcriptional modification, N6-methyladenosine (m6A), is associated with a number of crucial biological processes. The precise detection of m6A sites around the genome is critical for revealing its regulatory function and providing new insights into drug design. Although both experimental and computational models for detecting m6A sites have been introduced, but these conventional methods are laborious and expensive. Furthermore, only a handful of these models are capable of detecting m6A sites in various tissues. Therefore, a more generic and optimized computational method for detecting m6A sites in different tissues is required. In this paper, we proposed a universal model using a deep neural network (DNN) and named it TS-m6A-DL, which can classify m6A sites in several tissues of humans ( Homo sapiens), mice ( Mus musculus), and rats ( Rattus norvegicus). To extract RNA sequence features and to convert the input into numerical format for the network, we utilized one-hot-encoding method. The model was tested using fivefold cross-validation and its stability was measured using independent datasets. The proposed model, TS-m6A-DL, achieved accuracies in the range of 75–85% using the fivefold cross-validation method and 72–84% on the independent datasets. Finally, to authenticate the generalization of the model, we performed cross-species testing and proved the generalization ability by achieving state-of-the-art results.

Related collections

Most cited references 41

Record: found
Abstract: found
Article: not found

Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

W. Li, A. Godzik (2006)

In 2001 and 2002, we published two papers (Bioinformatics, 17, 282-283, Bioinformatics, 18, 77-82) describing an ultrafast protein sequence clustering program called cd-hit. This program can efficiently cluster a huge protein database with millions of sequences. However, the applications of the underlying algorithm are not limited to only protein sequences clustering, here we present several new programs using the same algorithm including cd-hit-2d, cd-hit-est and cd-hit-est-2d. Cd-hit-2d compares two protein datasets and reports similar matches between them; cd-hit-est clusters a DNA/RNA sequence database and cd-hit-est-2d compares two nucleotide datasets. All these programs can handle huge datasets with millions of sequences and can be hundreds of times faster than methods based on the popular sequence comparison and database search tools, such as BLAST.

0 comments Cited 1910 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq.

Dan Dominissini, Sharon Moshitch-Moshkovitz, Schraga Schwartz … (2012)

An extensive repertoire of modifications is known to underlie the versatile coding, structural and catalytic functions of RNA, but it remains largely uncharted territory. Although biochemical studies indicate that N(6)-methyladenosine (m(6)A) is the most prevalent internal modification in messenger RNA, an in-depth study of its distribution and functions has been impeded by a lack of robust analytical methods. Here we present the human and mouse m(6)A modification landscape in a transcriptome-wide manner, using a novel approach, m(6)A-seq, based on antibody-mediated capture and massively parallel sequencing. We identify over 12,000 m(6)A sites characterized by a typical consensus in the transcripts of more than 7,000 human genes. Sites preferentially appear in two distinct landmarks--around stop codons and within long internal exons--and are highly conserved between human and mouse. Although most sites are well preserved across normal and cancerous tissues and in response to various stimuli, a subset of stimulus-dependent, dynamically modulated sites is identified. Silencing the m(6)A methyltransferase significantly affects gene expression and alternative splicing patterns, resulting in modulation of the p53 (also known as TP53) signalling pathway and apoptosis. Our findings therefore suggest that RNA decoration by m(6)A has a fundamental role in regulation of gene expression.

0 comments Cited 1337 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Comprehensive analysis of mRNA methylation reveals enrichment in 3' UTRs and near stop codons.

Kate D Meyer, Yogesh Saletore, Paul Zumbo … (2012)

Methylation of the N(6) position of adenosine (m(6)A) is a posttranscriptional modification of RNA with poorly understood prevalence and physiological relevance. The recent discovery that FTO, an obesity risk gene, encodes an m(6)A demethylase implicates m(6)A as an important regulator of physiological processes. Here, we present a method for transcriptome-wide m(6)A localization, which combines m(6)A-specific methylated RNA immunoprecipitation with next-generation sequencing (MeRIP-Seq). We use this method to identify mRNAs of 7,676 mammalian genes that contain m(6)A, indicating that m(6)A is a common base modification of mRNA. The m(6)A modification exhibits tissue-specific regulation and is markedly increased throughout brain development. We find that m(6)A sites are enriched near stop codons and in 3' UTRs, and we uncover an association between m(6)A residues and microRNA-binding sites within 3' UTRs. These findings provide a resource for identifying transcripts that are substrates for adenosine methylation and reveal insights into the epigenetic regulation of the mammalian transcriptome. Copyright © 2012 Elsevier Inc. All rights reserved.

0 comments Cited 1156 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Quan Zou

Kil To Chong

Journal

Journal ID (nlm-ta): Comput Struct Biotechnol J

Journal ID (iso-abbrev): Comput Struct Biotechnol J

Title: Computational and Structural Biotechnology Journal

Publisher: Research Network of Computational and Structural Biotechnology

ISSN (Electronic): 2001-0370

Publication date PMC-release: 10 August 2021

Publication date Collection: 2021

Publication date (Electronic): 10 August 2021

Volume: 19

Pages: 4619-4625

Affiliations

[a ]Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea

[b ]Institute of Avionics and Aeronautics (IAA), Air University, Islamabad 44000, Pakistan

[c ]School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea

[d ]Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China

[e ]Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, South Korea

Author notes

[* ]Corresponding authors at: Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China (Q. Zou). Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea (K.T. Chong). zouquan@ 123456nclab.net kitchong@ 123456jbnu.ac.kr

[1]

Zeeshan Abbas and Hilal Tayara contributed equally.

Article

Publisher Item ID: S2001-0370(21)00345-7

DOI: 10.1016/j.csbj.2021.08.014

PMC ID: 8383060

PubMed ID: 34471503

SO-VID: 389ca1c4-1b1f-4bff-b23d-e342a054da4e

License:

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

History

Date received : 24 May 2021

Date revision received : 8 August 2021

Date accepted : 9 August 2021

Comments

Comment on this article

scite_

Cited by 12

See all cited by

Most referenced authors 887

See all reference authors

TS-m6A-DL: Tissue-specific identification of N6-methyladenosine sites using a universal deep learning model

Read this article at

Graphical abstract

Abstract

Related collections

Genome Engineering using CRISPR

Most cited references 41

Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq.

Comprehensive analysis of mRNA methylation reveals enrichment in 3' UTRs and near stop codons.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 202

Cited by 12

Most referenced authors 887