EPI-SF: essential protein identification in protein interaction networks using sequence features

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Proteins are considered indispensable for facilitating an organism’s viability, reproductive capabilities, and other fundamental physiological functions. Conventional biological assays are characterized by prolonged duration, extensive labor requirements, and financial expenses in order to identify essential proteins. Therefore, it is widely accepted that employing computational methods is the most expeditious and effective approach to successfully discerning essential proteins. Despite being a popular choice in machine learning (ML) applications, the deep learning (DL) method is not suggested for this specific research work based on sequence features due to the restricted availability of high-quality training sets of positive and negative samples. However, some DL works on limited availability of data are also executed at recent times which will be our future scope of work. Conventional ML techniques are thus utilized in this work due to their superior performance compared to DL methodologies. In consideration of the aforementioned, a technique called EPI-SF is proposed here, which employs ML to identify essential proteins within the protein-protein interaction network (PPIN). The protein sequence is the primary determinant of protein structure and function. So, initially, relevant protein sequence features are extracted from the proteins within the PPIN. These features are subsequently utilized as input for various machine learning models, including XGB Boost Classifier, AdaBoost Classifier, logistic regression (LR), support vector classification (SVM), Decision Tree model (DT), Random Forest model (RF), and Naïve Bayes model (NB). The objective is to detect the essential proteins within the PPIN. The primary investigation conducted on yeast examined the performance of various ML models for yeast PPIN. Among these models, the RF model technique had the highest level of effectiveness, as indicated by its precision, recall, F1-score, and AUC values of 0.703, 0.720, 0.711, and 0.745, respectively. It is also found to be better in performance when compared to the other state-of-arts based on traditional centrality like betweenness centrality (BC), closeness centrality (CC), etc. and deep learning methods as well like DeepEP, as emphasized in the result section. As a result of its favorable performance, EPI-SF is later employed for the prediction of novel essential proteins inside the human PPIN. Due to the tendency of viruses to selectively target essential proteins involved in the transmission of diseases within human PPIN, investigations are conducted to assess the probable involvement of these proteins in COVID-19 and other related severe diseases.

Related collections

Most cited references 58

Record: found
Abstract: not found
Article: not found

Random Forests

Leo Breiman (2001)

0 comments Cited 7001 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A SARS-CoV-2 Protein Interaction Map Reveals Targets for Drug-Repurposing

David E Gordon, Gwendolyn M Jang, Mehdi Bouhaddou … (2020)

SUMMARY The novel coronavirus SARS-CoV-2, the causative agent of COVID-19 respiratory disease, has infected over 2.3 million people, killed over 160,000, and caused worldwide social and economic disruption 1,2 . There are currently no antiviral drugs with proven clinical efficacy, nor are there vaccines for its prevention, and these efforts are hampered by limited knowledge of the molecular details of SARS-CoV-2 infection. To address this, we cloned, tagged and expressed 26 of the 29 SARS-CoV-2 proteins in human cells and identified the human proteins physically associated with each using affinity-purification mass spectrometry (AP-MS), identifying 332 high-confidence SARS-CoV-2-human protein-protein interactions (PPIs). Among these, we identify 66 druggable human proteins or host factors targeted by 69 compounds (29 FDA-approved drugs, 12 drugs in clinical trials, and 28 preclinical compounds). Screening a subset of these in multiple viral assays identified two sets of pharmacological agents that displayed antiviral activity: inhibitors of mRNA translation and predicted regulators of the Sigma1 and Sigma2 receptors. Further studies of these host factor targeting agents, including their combination with drugs that directly target viral enzymes, could lead to a therapeutic regimen to treat COVID-19.

0 comments Cited 2147 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants

Janet Piñero, Àlex Bravo, Núria Queralt-Rosinach … (2016)

The information about the genetic basis of human diseases lies at the heart of precision medicine and drug discovery. However, to realize its full potential to support these goals, several problems, such as fragmentation, heterogeneity, availability and different conceptualization of the data must be overcome. To provide the community with a resource free of these hurdles, we have developed DisGeNET (http://www.disgenet.org), one of the largest available collections of genes and variants involved in human diseases. DisGeNET integrates data from expert curated repositories, GWAS catalogues, animal models and the scientific literature. DisGeNET data are homogeneously annotated with controlled vocabularies and community-driven ontologies. Additionally, several original metrics are provided to assist the prioritization of genotype–phenotype relationships. The information is accessible through a web interface, a Cytoscape App, an RDF SPARQL endpoint, scripts in several programming languages and an R package. DisGeNET is a versatile platform that can be used for different research purposes including the investigation of the molecular underpinnings of specific human diseases and their comorbidities, the analysis of the properties of disease genes, the generation of hypothesis on drug therapeutic action and drug adverse effects, the validation of computationally predicted disease genes and the evaluation of text-mining methods performance.

0 comments Cited 967 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Sovan Saha

Subhadip Basu

Journal

Journal ID (nlm-ta): PeerJ

Journal ID (iso-abbrev): PeerJ

Journal ID (publisher-id): peerj

Title: PeerJ

Publisher: PeerJ Inc. (San Diego, USA )

ISSN (Electronic): 2167-8359

Publication date (Electronic): 13 March 2024

Publication date Collection: 2024

Volume: 12

Electronic Location Identifier: e17010

Affiliations

[1 ]Department of Computer Science & Engineering (Artificial Intelligence & Machine Learning), Techno Main Salt Lake , Kolkata, West Bengal, India

[2 ]Department of Computer Science & Engineering, Netaji Subhash Engineering College , Kolkata, West Bengal, India

[3 ]Department of Computer Science & Engineering, Jadavpur University , Kolkata, West Bengal, India

Article

Publisher ID: 17010

DOI: 10.7717/peerj.17010

PMC ID: 10944162

PubMed ID: 38495766

SO-VID: 747e9cfc-03b3-4f5a-9b07-9b4ce223e8fb

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

History

Date received : 24 November 2023

Date accepted : 5 February 2024

Funding

Funded by: Computer Science and Engineering Department, Jadavpur University, India

Funded by: Department of Biotechnology project

Award ID: BT/PR16356/BID/7/596/2016

Funded by: Ministry of Science and Technology, Government of India

The authors received support (infrastructure facilities) from the “Center for Microprocessor Applications for Training Education and Research” research laboratory of the Computer Science and Engineering Department, Jadavpur University, India. In addition, this project is also supported by the Department of Biotechnology project (No. BT/PR16356/BID/7/596/2016), Ministry of Science and Technology, Government of India. There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

EPI-SF: essential protein identification in protein interaction networks using sequence features

Read this article at

Abstract

Related collections

Novel Coronavirus Disease COVID-19

Most cited references 58

Random Forests

A SARS-CoV-2 Protein Interaction Map Reveals Targets for Drug-Repurposing

DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants

Author and article information

Contributors

Journal

Affiliations

Article

History

Funding

Categories

Comments

Comment on this article

Similar content 81

Most referenced authors 1,483