A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Selecting and prioritizing candidate disease genes is necessary before conducting laboratory studies as identifying disease genes from a large number of candidate genes using laboratory methods, is a very costly and time-consuming task. There are many machine learning-based gene prioritization methods. These methods differ in various aspects including the feature vectors of genes, the used datasets with different structures, and the learning model. Creating a suitable feature vector for genes and an appropriate learning model on a variety of data with different and non-Euclidean structures, including graphs, as well as the lack of negative data are very important challenges of these methods. The use of graph neural networks has recently emerged in machine learning and other related fields, and they have demonstrated superior performance for a broad range of problems.

Methods

In this study, a new semi-supervised learning method based on graph convolutional networks is presented using the novel constructing feature vector for each gene. In the proposed method, first, we construct three feature vectors for each gene using terms from the Gene Ontology (GO) database. Then, we train a graph convolution network on these vectors using protein–protein interaction (PPI) network data to identify disease candidate genes. Our model discovers hidden layer representations encoding in both local graph structure as well as features of nodes. This method is characterized by the simultaneous consideration of topological information of the biological network (e.g., PPI) and other sources of evidence. Finally, a validation has been done to demonstrate the efficiency of our method.

Results

Several experiments are performed on 16 diseases to evaluate the proposed method's performance. The experiments demonstrate that our proposed method achieves the best results, in terms of precision, the area under the ROC curve (AUCs), and F1-score values, when compared with eight state-of-the-art network and machine learning-based disease gene prioritization methods.

Conclusion

This study shows that the proposed semi-supervised learning method appropriately classifies and ranks candidate disease genes using a graph convolutional network and an innovative method to create three feature vectors for genes based on the molecular function, cellular component, and biological process terms from GO data.

Related collections

Most cited references 56

Record: found
Abstract: not found
Article: not found

The genetic association database.

Kevin G. Becker, Kathleen C Barnes, Tiffani Bright … (2004)

0 comments Cited 435 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings

Milton Friedman (1940)

0 comments Cited 323 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Gene prioritization through genomic data fusion.

Stein Aerts, Diether Lambrechts, Sunit Maity … (2006)

The identification of genes involved in health and disease remains a challenge. We describe a bioinformatics approach, together with a freely accessible, interactive and flexible software termed Endeavour, to prioritize candidate genes underlying biological processes or diseases, based on their similarity to known genes involved in these phenomena. Unlike previous approaches, ours generates distinct prioritizations for multiple heterogeneous data sources, which are then integrated, or fused, into a global ranking using order statistics. In addition, it offers the flexibility of including additional data sources. Validation of our approach revealed it was able to efficiently prioritize 627 genes in disease data sets and 76 genes in biological pathway sets, identify candidates of 16 mono- or polygenic diseases, and discover regulatory genes of myeloid differentiation. Furthermore, the approach identified a novel gene involved in craniofacial development from a 2-Mb chromosomal region, deleted in some patients with DiGeorge-like birth defects. The approach described here offers an alternative integrative method for gene discovery.

0 comments Cited 313 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Saeid Azadifar: saeid.azadifar@email.kntu.ac.ir

Ali Ahmadi: ahmadi@kntu.ac.ir

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Journal ID (iso-abbrev): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2105

Publication date (Electronic): 14 October 2022

Publication date PMC-release: 14 October 2022

Publication date Collection: 2022

Volume: 23

Electronic Location Identifier: 422

Affiliations

GRID grid.411976.c, ISNI 0000 0004 0369 2065, Faculty of Computer Engineering, , K. N. Toosi University of Technology, ; Tehran, Iran

Article

Publisher ID: 4954

DOI: 10.1186/s12859-022-04954-x

PMC ID: 9563530

PubMed ID: 36241966

SO-VID: 62fb815a-c85f-42b2-a319-26529b760e33

License:

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

History

Date received : 12 July 2022

Date accepted : 20 September 2022

Custom metadata

ScienceOpen disciplines: Bioinformatics & Computational biology

Keywords: gene prioritization,graph convolutional networks,protein–protein interaction,semi-supervised learning,gene identification

Data availability:

ScienceOpen disciplines: Bioinformatics & Computational biology

Keywords: gene prioritization, graph convolutional networks, protein–protein interaction, semi-supervised learning, gene identification

A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning

Read this article at

Abstract

Background

Methods

Results

Conclusion

Related collections

Genetoberfest

Most cited references 56

The genetic association database.

A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings

Gene prioritization through genomic data fusion.

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 90

Cited by 2

Most referenced authors 525